Transcriptional regulators and methods thereof

ABSTRACT

The invention relates to transcriptional regulators and related methods thereof. The invention further relates to the identification of genes regulated by transcriptional regulators, to the treatment of diseases associated with abnormal function of a transcriptional regulator and to the modulation of gene expression, including genes expressed in hepatocytes or pancreatic cells, through the modulation of transcriptional regulator activity.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of the filing date of U.S.Application No. 60/525,318, filed Nov. 26, 2003, entitled “CONTROL OFPANCREAS AND LIVER GENE EXPRESSION BY HNF TRANSCRIPTION FACTORS”, U.S.Application No. 60/542,520, filed Feb. 6, 2004, entitled “CONTROL OFPANCREAS AND LIVER GENE EXPRESSION BY HNF TRANSCRIPTION FACTORS”, U.S.Application No. 60/544,835, filed Feb. 13, 2004, entitled “CONTROL OFPANCREAS AND LIVER GENE EXPRESSION BY HNF TRANSCRIPTION FACTORS”, andU.S. Application No. 60/547,933, filed Feb. 26, 2004, entitled“TRANSCRIPTIONAL REGULATORS AND METHODS THEREOF”. The entire teachingsof the referenced applications are incorporated by reference herein.

FUNDING

The invention described herein was supported, in whole or in part, bythe U.S. Department of Energy Program for Computational MolecularBiology. The United States government has certain rights in theinvention.

BACKGROUND OF THE INVENTION

Gene expression is controlled by transcriptional regulatory proteins,which bind specific DNA sequences and recruit cofactors and thetranscription apparatus to promoters (1-3). The expression oftranscriptional regulators themselves is also regulated bytranscriptional regulators, and a single gene may be regulated bymultiple transcription factors. As a result of these regulatorynetworks, or pathways, misregulation of a single transcriptionalregulator in a cell can result in the aberrant expression of multiplegenes in the network in which the transcriptional regulator is active,leading to disease in the organism.

Current methods of identifying the genes controlled by a transcriptionalregulator typically include a comparison of the mRNA levels of candidatetarget in cells which express the transcriptional regulator and controlcells which either do not express it. Often, this involvesoverexpressing a recombinant transcriptional regulator in a given celltype and using, as a control cell, one which overexpresses a controlrecombinant protein or no recombinant protein at all. However, given tothe artificial nature of using cell lines and overexpressing transgenes,the results obtained from such approaches may not reflect the in vivoregulation by native transcriptional regulators in an organism.

Genome-wide analysis methods have been used recently to determine howtagged transcriptional regulators encoded in Saccharomyces cerevisae areassociated with the genome in living yeast cells and to model thetranscriptional regulatory circuitry of these cells (4). These methodshave also been used in human tissue culture cells to identify targetgenes for several transcriptional regulators (5-7).

However, the need remains to develop genome-scale analysis methods todetermine how transcriptional regulators control the global geneexpression programs that characterize specific tissues, and inparticular, freshly isolated, primary tissues, in which thetranscriptional regulators are likely to maintain their in vivospecificities. Furthermore, there is a need to identify the regulatorynetworks or pathways in which a given transcriptional activator acts, inpart, to allow for the identification of therapeutic targets fordiseases caused by aberrant function of a transcriptional regulator.

SUMMARY OF THE INVENTION

In one aspect, the invention provides a method of identifying the genesregulated by a transcriptional regulator. One aspect of the inventionprovides a method of determining which genes from a subset of genes areregulated by a transcriptional regulator in a cell, the methodcomprising (a) selectively isolating chromatin from a cell whichexpresses the transcriptional regulator to generate isolated chromatin;(b) selectively isolating chromatin fragments from the isolatedchromatin to generate bound chromatin fragments, wherein the boundchromatin fragments are bound by the transcriptional regulator; (c)amplifying both the bound chromatin fragments to generate amplifiedchromatin fragments and the isolated chromatin to generate amplifiedcontrol chromatin; (d) hybridizing the amplified control chromatin andthe amplified chromatin fragments to a DNA microarray, wherein the DNAmicroarray comprises (1) at least 10,000 experimental spots, eachexperimental spot comprising an experimental DNA, each experimental DNAcomprising a promoter region from a gene in the subset; and (2) at least100 control spots, each control spot comprising a control DNA, eachcontrol DNA comprising a non-promoter region; and (e) determining andcomparing a hybridization signal at each of the spots on the microarraybetween those generated by (1) the amplified control chromatin; and (2)the amplified chromatin fragments; wherein a gene in the subset is saidto be regulated by the transcriptional regulator in the cell if a spotcomprising a promoter region of said gene displays a higher level ofhybridization by the amplified chromatin fragments than by the amplifiedcontrol chromatin.

In another aspect, the invention provides methods of identifyingregulatory networks, or pathways, in a cell. The invention provides amethod of identifying a transcriptional regulatory network in a cell,the method comprising determining if a transcriptional regulatorregulates additional transcriptional regulators in the cell using themethod of any of the methods described herein, wherein a transcriptionalregulatory network is identified if at least one additionaltranscriptional regulator is regulated by the transcriptional regulator.

The invention also provides a method of identifying a transcriptionalregulatory network in a cell, the method comprising determining if atranscriptional regulator regulates (i) its own promoter; or (ii) apromoter from a plurality of transcriptional regulators; using any ofthe methods described herein, wherein the experimental DNA comprises (a)a promoter from the transcriptional regulator; and (b) promoters fromthe plurality of transcriptional regulators; wherein a transcriptionalregulatory network is identified if the transcriptional regulatorregulates itself or if it regulates at least one of the plurality oftranscriptional regulators.

The invention further provides a method of identifying transcriptionalregulatory networks in a cell, the method comprising (a) determining, byrepeating a method of identifying the targets of transcriptionalregulator for each of a plurality of transcriptional regulators, thegenes in a subset which are regulated by each of the plurality oftranscriptional regulators, wherein the experimental DNA comprisespromoter regions for each of the plurality of transcriptionalregulators; (b) determining if any one of the plurality oftranscriptional regulators are regulated by at least one of theplurality of transcriptional regulators; wherein a transcriptionalregulatory network is identified if any one of the plurality oftranscriptional regulators is regulated by at least one of the pluralityof transcriptional regulators.

The invention also provides a DNA microarray for determining promoteroccupancy in a human cell, the microarray comprising (1) at least 10,000experimental spots, each experimental spot comprising an experimentalDNA, each experimental DNA comprising a promoter region from a humangene in the subset; and (2) at least 100 control spots, each controlspot comprising a control DNA, each control DNA comprising anon-promoter region; wherein at least 75% of the promoter regionscomprise from at least 700 bp upstream to at least 200 bp downstream ofthe transcriptional start site.

Another aspect of the invention provides a method of estimating if atranscriptional regulator is a global transcriptional regulator, themethod comprising (a) selectively isolating chromatin from a tissue; (b)identifying promoter regions from the chromatin which are bound by acandidate global transcriptional regulator; (c) identifying promoterregions from the chromatin which are bound by a member of the basaltranscriptional machinery; and (d) comparing the promoter regionsidentified in steps (b) and (c) to determine the ratio between (i) thenumber of promoter regions bound by both the candidate globaltranscriptional regulator and the member of the basal transcriptionalmachinery; and (ii) the number of promoter regions bound by the memberof the basal transcriptional machinery, wherein a transcriptionalregulator is a global transcriptional regulator when the ratio isgreater than 0.2.

The invention further provides methods of identifying targets fortherapeutics. In one aspect, the invention provides a method ofidentifying at least one target gene for the development of atherapeutic to treat or prevent a disorder in a subject, wherein atleast one form of the disorder is caused by an altered activity in atranscriptional regulator or in a suspected transcriptional regulator,the method comprising (a) identifying the genes regulated by thetranscriptional regulator in a cell; (b) determining if thetranscriptional regulator is a broad-acting transcriptional regulator ora narrow-acting transcriptional regulator, wherein if thetranscriptional regulator is a broad acting transcriptional regulatorthen the transcriptional regulator is a target gene for the developmentof a therapeutic, and wherein if the transcriptional regulator is anarrow acting transcriptional regulator then (i) determining if at leastone gene regulated by the transcriptional regulator is likely causativein the disorder, wherein a gene that is likely causative in the disorderis a target gene for the development of a therapeutic; and (ii)reiterating steps (a) and (b) for at least one gene that is regulated bythe transcriptional regulator in the cell and that either (1) encodes atranscriptional regulator or (2) is suspected to encode atranscriptional regulator, with the modification that thetranscriptional regulator of steps (a) and (b) is said gene, therebyidentifying at least one target gene for the development of atherapeutic to treat or prevent a disorder in the subject.

The invention also provides methods of treating or preventing disease.In one aspect, the invention provides a method of treating or preventingtype II diabetes in a subject, comprising administering to the subject atherapeutically effective amount of an agent that increases the globaltranscriptional activity of HNF4alpha.

In another aspect, the invention provides a method of treating orpreventing a disorder associated with low transcriptional activity ofHNF4alpha in a subject, comprising administering to the subject atherapeutically effective amount of an agent that increases the globaltranscriptional activity of HNF4alpha. A related aspect provides amethod of treating or preventing a disorder associated with hightranscriptional activity of HNF4alpha in a subject, comprisingadministering to the subject a therapeutically effective amount of anagent that decreases the global transcriptional activity of HNF4alpha.

The invention also provides a method of increasing the globaltranscriptional activity in a liver or a pancreatic cell comprisingcontacting the cell with an agent which increases the globaltranscriptional activity of HNF4alpha. A related aspect provides amethod of decreasing the global transcriptional activity in a liver or apancreatic cell comprising contacting the cell with an agent whichdecreases the global transcriptional activity of HNF4alpha.

One aspect of the invention provides methods of regulating theexpression level of genes. On aspect provides a method of regulating theexpression level of any one of the genes in FIG. 13 in a hepatocyte, themethod comprising contacting the cell with an agent which regulates thetranscriptional activity of HNF1alpha. A related aspect provides amethod of regulating the expression level of any one of the genes inFIG. 14 in a pancreatic cell, the method comprising contacting the cellwith an agent which regulates the transcriptional activity of HNF1alpha.

Another aspect of the invention provides a method of regulating theexpression level of any one of the genes in FIG. 16 in a hepatocyte, themethod comprising contacting the cell with an agent which regulates thetranscriptional activity of HNF6. A related aspect provides a method ofregulating the expression level of any one of the genes in FIG. 17 in apancreatic cell, the method comprising contacting the cell with an agentwhich regulates the transcriptional activity of HNF6.

Yet another aspect of the invention provides a method of regulating theexpression level of any one of the genes in FIG. 18 in a hepatocyte, themethod comprising contacting the cell with an agent which regulates thetranscriptional activity of HNF4alpha. A related aspect provides amethod of regulating the expression level of any one of the genes inFIG. 19 in a pancreatic cell, the method comprising contacting the cellwith an agent which regulated the transcriptional activity of HNF4alpha.

The invention also provides methods for identifying transcriptionallyactive genes that are regulated by a transcriptional regulator in acell. In one aspect, the invention provides a method of identifyingtranscriptionally active genes that are regulated by a transcriptionalregulator in a cell, the method comprising (a) selectively isolatingchromatin from a tissue; (b) identifying promoter regions from thechromatin that are bound by the transcriptional regulator; (c)identifying promoter regions from the chromatin that are bound by amember of the basal transcriptional machinery; and (d) comparing thepromoter regions identified in steps (b) and (c) to determineoverlapping genes, wherein the overlapping genes are transcriptionallyactive genes regulated by the transcriptional regulator.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A-1C show genome-scale location analysis of HNF regulators inhuman tissues. (A) Hepatocytes and pancreatic islets were obtained fromtissue distribution programs. These cells were treated with formaldehydeto covalently link transcription factors to DNA sites of interaction.Cells were harvested, and chromatin in cell lysates was sheared bysonication. The regulator-DNA complexes were enriched by chromatinimmunoprecipitation with specific antibodies, the crosslinks werereversed, and enriched DNA fragments and control genomic DNA fragmentswere amplified using ligation-mediated PCR. The amplified DNApreparations, labeled with distinct fluorophores, were mixed andhybridized onto a promoter array. (B) Venn diagram showing the overlapof HNF1α, HNF6, and HNF4α bound promoters in hepatocytes (top) andpancreatic islets (bottom). (C) The collection of genes occupied by RNApolymerase II in hepatocytes is displayed as a circle, with the genesbound by HNF1α, HNF6, and HNF4α outlined collectively as a fraction ofthe chart. The relative contributions of HNF1α, HNF6, and HNF4α areshown as framing arcs.

FIGS. 2A-2B show transcriptional regulatory networks and motifs. (A)HNF1α, HNF6, and HNF4α are at the center of tissue-specifictranscriptional regulatory networks. In these examples selected forillustration, regulatory proteins and their gene targets are representedas circles and boxes, respectively. Solid arrows indicate protein-DNAinteractions, and genes encoding regulators are linked to their proteinproducts by dashed lines. The HNF4a7 promoter, also known as the P2promoter (24, 25), was recently implicated as a major human diabetessusceptibility locus (see text). (B) Examples of regulatory networkmotifs in hepatocytes. For instance, in the multi-component loop, HNF1αprotein binds to the promoter of the HNF4α gene, and the HNF4α proteinbinds to the promoter of the HNF1α gene. These network motifs wereuncovered by searching binding data with various algorithms; for detailson the algorithms used and a full list of motifs found, see (20).

FIG. 3 shows one embodiment of a strategy for the identification of atleast one target gene of a master regulator for the development of atherapeutic to treat or prevent a disorder.

FIG. 4 shows a Venn diagram showing the overlap of two single,independent ChIP experiments using hepatocytes with anti-HNF4

antibodies sc-6556 and sc-8987.

FIG. 5 shows a Western blot of HNF4

in HepG2 cells using 50 μg of cell lysate protein with Ab sc-6556. Thelower running band is approximately 50 kDa, which is the canonicalmolecular weight for HNF4a, and the higher running band is theappropriate location for HNF4

dimer. A very similar gel showing HNF4a antibody specificity for sc-6556is available at the Santa Cruz website (www.scbt.com).

FIGS. 6A-6D show scatterplots of attempted chromatinimmunoprecipitations performed with the anti-HNF4a antibody sc-6556using Jurkat (T-lymphocyte derived, 6A), BJ-T (foreskin fibroblastderived, 6B), and U937 (histocyte derived, 6C) cells. To demonstrate thenoise inherent in the array analysis, applicants show a scatterplot of asample of input DNA, split, labeled with the two fluorophores, andhybridized to an array (6D). Identical control experiments performedusing the anti-HNF1a antibody sc-6547 afforded essentially identicalresults.

FIG. 7 shows a scatterplot of a chromatin immunoprecipitation performedwith pre-immune commercial rabbit serum using hepatocytes (left). Goatpre-immune serum and two rabbit sera from different individuals gave asimilar scatterplot. For comparison, applicants show the scatterplot foran equivalent ChIP with the anti-HNF4a antibody sc-6556 usinghepatocytes (right).

FIG. 8 shows a Venn diagram showing the overlap of the sets of promotersbound by HNF4α and RNA Pol II in hepatocytes and pancreatic islets.

FIG. 9 shows a composite gel of gene-specific chromatinimmunoprecipitation reactions using anti-HNF4α antibody sc-6556 withcrosslinked human hepatocytes.

FIG. 10 shows composite gel of gene-specific chromatinimmunoprecipitation reactions using anti-HNF1α antibody sc-6547 withcrosslinked human hepatocytes.

FIG. 11 shows a partial list of proximal promoters occupied by of HNF1ain human hepatocytes and pancreatic islets. These genes were assigned tofunctional categories using the program ProtoGo; genes not in thisautomated GO ontology database were assigned using Locuslinkinformation. Four genes are shown for each tissue/category combination;for some combinations, fewer than 4 promoters qualified as targets.Hypothetical and functionally uncharacterized genes are not shown. Acomplete list of targets is available in FIGS. 13 and 14.

FIG. 12 shows Occupancy of BJ-T and tissue-specific promoter sets by HNFfactors. (*) Indicates that comparisons between BJ-T and primary tissuesused only a subset of Hu13K array promoters, as RNA Pol II was profiledin BJ-T cells using a smaller, prototype array. The denominator in theabove fractions represents the number of targets the HNF factor ofinterest occupied in the set of RNA Pol II occupied promoters that areeither BJ-T specific or primary tissue specific.

FIG. 13 shows HNF1α bound promoters in hepatocytes

FIG. 14 shows HNF1α bound promoters in pancreatic islets.

FIGS. 15A-15D show genes previously suggested to be regulated by HNF1aand HNF4a. ‘Direct’ binding is in vivo ChIP and in vivo footprinting,‘in vitro’ binding is primarily gel mobility retardation assays and invitro footprinting, and ‘indirect’ is primarily transient transfections.‘Sequence-based’ uses a number of different criteria to qualify binding.Note that some duplicate reports are omitted, as are a handful of recentlarge-scale screens, (e.g. Tronche 1997, Shih 2001, etc.).

FIG. 16 shows HNF6 bound promoters in hepatocytes.

FIG. 17 shows HNF6 bound promoters in pancreatic islets.

FIG. 18A-18C show HNF4α bound promoters in hepatocytes.

FIGS. 19A-19C show HNF4α bound promoters in pancreatic islets.

FIGS. 20A-20B show the feed forward regulatory motifs in hepatocytes.The regulatory modules here were derived as described inexemplification. Feed forwards only involving HNF1

and HNF4

are also multi-input motifs, as they bind each other's promoters in amulticomponent loop.

FIGS. 21A-21B show multi-input motifs in hepatocytes. The regulatorymodules here were derived as described in the exemplification. MIMs forthe HNF6/HNF4a and HNF1a/HNF4a are listed in FIG. 20 as feedforwardmotifs.

FIGS. 22A-22B show the feed forward regulatory motifs in pancreaticislets. The regulatory modules here were derived as described inSupporting Online Material. Feed forwards only involving HNF1a and HNF4aare also multiinput motifs, as they bind each other's promoters in amulticomponent loop.

FIGS. 23A-23B show multi-Input motifs in pancreatic islets Theregulatory modules here were derived as described in Supporting OnlineMaterial. MIMs for the HNF6/HNF4

and HNF1a/HNF4

are listed in FIG. 22 as feedforward.

FIGS. 24A-24B show transcriptional regulators occupied by HNF1

and HNF4a. Network of DNA regulators downstream of HNF1a and HNF4a inhepatocytes and islets. Target genes that are among the Gene Ontology“DNA-regulators” category were compiled, and are listed according tofunctional subcategory.

DETAILED DESCRIPTION OF THE INVENTION

I. Overview

In certain aspects, the invention provides methods related totranscriptional regulators. Some aspects of the invention providemethods for the identification of genes whose transcription is regulatedby a specific transcriptional regulator in a cell. Some of these methodscomprise determining the promoter occupancy of the transcriptionalregulator using a combination of chromatin immunoprecipitation and/orDNA microarray analysis of the promoter regions that are physicallyassociated with the transcriptional regulator in the cell. In someembodiments of the methods described herein, the DNA microarraycomprises both experimental spots containing promoter DNA, and controlspots containing non-promoter DNA. The methods described herein may beapplied to any cell type, including transplant grade primary humantissue. Furthermore, the method described herein can be used to comparethe function of transcriptional regulators across cell types, or acrosstwo populations, such as healthy and disease-afflicted subjects.

In a related aspect, the invention provides methods of identifyingregulatory networks, or pathways. Some methods comprise identifying thetranscriptional regulators which are regulated by a giventranscriptional regulator, and optionally, determining the genes thatare regulated by those transcriptional regulators. Pathways that may beidentified using the methods described herein include autoregulatory,multicomponent, feed-forward, and multi-components loops, as well asregulatory chains.

The invention also provides methods of determining if a transcriptionalregulator is a global transcriptional regulator. In some aspects, suchmethods comprise determining the promoter occupancy of both atranscriptional regulator and a member of the basal transcriptionalmachinery. Comparison of the promoter occupancy by the transcriptionalregulator and by the member of the basal transcriptional machineryallows the identification of transcriptionally active promoters that arebound and regulated by the transcription regulator. Other methodsfurther comprise extrapolating from the set of promoters that wereexamined to the total number of promoters in the genome to determine theapproximate number of transcriptionally active promoters in a cell thatare under the control of a specific transcriptional factor or todetermine if the transcriptional regulator is a global transcriptionalregulator.

Other aspects of the invention provide methods of identifyingtherapeutic targets to treat disease. One specific aspect of theinvention relates to identifying at least one target gene for thedevelopment of a therapeutic agent to treat or prevent a disorder in asubject, preferably a disorder in which at least one form of thedisorder is caused by an altered activity in a transcriptional regulatoror in a gene suspected to encode a transcriptional regulator. Some ofthe methods provided herein to identify therapeutic targets comprisedetermining if a transcriptional regulator implicated in the disease isa broad-acting or a narrow-acting transcriptional regulator, such as byidentifying at least a subset of the genes that it regulates in a cell,wherein broad-acting transcriptional regulators are targets fortherapeutic agents. If the transcriptional regulator is narrow-acting,then the genes that it regulates may be examined further to determine ifany are broad-acting transcriptional regulators (for those genesencoding transcriptional regulators) or if any of the genes arecausative to the disease state i.e. they regulate a pathway or networkthat is impaired in the disease state.

The invention further provides methods for the treatment of disease.Some aspects of the invention provide methods of treating metabolicdisorders, such as type II diabetes. Specific aspects of the inventionprovide methods of treating or preventing type II diabetes in a subjectby administering to the subject a therapeutically effective amount of anagent that increases the global transcriptional activity of HNF4α.Furthermore, the invention provides methods for modulating theexpression level of genes. Such methods are based, in part, on thefinding by Applicants of genes which are transcriptionally regulated byHNF1α, HNF4α or HNF6 in hepatocytes and pancreatic cells. In a relatedaspect, the invention provides methods of modulating and expressionlevel of, and alleviating a disease state associated with the abnormalexpression of, the genes in FIGS. 13-19 by modulating thetranscriptional activity or expression of HNF1α, HNF4α or HNF6. Inspecific embodiments, the expression of the genes is modulated inhepatocytes, pancreatic cells, or both.

II. Definitions

For convenience, certain terms employed in the specification, examples,and appended claims, are collected here. Unless defined otherwise, alltechnical and scientific terms used herein have the same meaning ascommonly understood by one of ordinary skill in the art to which thisinvention belongs.

The articles “a” and “an” are used herein to refer to one or to morethan one (i.e., to at least one) of the grammatical object of thearticle. By way of example, “an element” means one element or more thanone element.

The term “including” is used herein to mean, and is used interchangeablywith, the phrase “including but not limited” to.

The term “or” is used herein to mean, and is used interchangeably with,the term “and/or,” unless context clearly indicates otherwise.

The term “such as” is used herein to mean, and is used interchangeably,with the phrase “such as but not limited to”.

A “patient” or “subject” to be treated by the method of the inventioncan mean either a human or non-human animal, preferably a mammal.

The terms “alpha” and “α” are used interchangeably, as are the terms“beta” and “β”.

The term “encoding” comprises an RNA product resulting fromtranscription of a DNA molecule, a protein resulting from thetranslation of an RNA molecule, or a protein resulting from thetranscription of a DNA molecule and the subsequent translation of theRNA product.

A “promoter” is a nucleic acid sequence that directs transcription of anucleic acid. A promoter includes nucleic acid sequences near the startsite of transcription, e.g., a TATA box, see, e.g., Butler and Kadonaga(2002) Genes Dev. 16:2583-2592; Georgel (2002) Biochem. Cell Biol.80:295-300. A promoter also optionally includes distal enhancer orrepressor elements, which can be located as much as several thousandbase pairs on either side from the start site of transcription. A“constitutive” promoter is a promoter that is active under mostenvironmental and developmental conditions, while an “inducible”,promoter is a promoter is active or activated under, e.g., specificenvironmental or developmental conditions.

The term “expression” is used herein to mean the process by which apolypeptide is produced from DNA. The process involves the transcriptionof the gene into mRNA and the translation of this mRNA into apolypeptide. Depending on the context in which used, “expression” mayrefer to the production of RNA, protein or both.

The term “recombinant” is used herein to mean any nucleic acidcomprising sequences which are not adjacent in nature. A recombinantnucleic acid may be generated in vitro, for example by using the methodsof molecular biology, or in vivo, for example by insertion of a nucleicacid at a novel chromosomal location by homologous or non-homologousrecombination.

The term “transcriptional regulator” refers to a biochemical elementthat acts to prevent or inhibit the transcription of a promoter-drivenDNA sequence under certain environmental conditions (e.g., a repressoror nuclear inhibitory protein), or to permit or stimulate thetranscription of the promoter-driven DNA sequence under certainenvironmental conditions (e.g., an inducer or an enhancer).

The term “microarray” refers to an array of distinct polynucleotides oroligonucleotides synthesized on a substrate, such as paper, nylon orother type of membrane, filter, chip, glass slide, or any other suitablesolid support.

The terms “disorders” and “diseases” are used inclusively and refer toany deviation from the normal structure or function of any part, organor system of the body (or any combination thereof). A specific diseaseis manifested by characteristic symptoms and signs, includingbiological, chemical and physical changes, and is often associated witha variety of other factors including, but not limited to, demographic,environmental, employment, genetic and medically historical factors.Certain characteristic signs, symptoms, and related factors can bequantitated through a variety of methods to yield important diagnosticinformation.

The terms “level of expression of a gene in a cell” or “gene expressionlevel” refer to the level of mRNA, as well as pre-mRNA nascenttranscript(s), transcript processing intermediates, mature mRNA(s) anddegradation products, encoded by the gene in the cell.

The term “modulation” refers to upregulation (i.e., activation orstimulation), downregulation (i.e., inhibition or suppression) of aresponse, or the two in combination or apart. A “modulator” is acompound or molecule that modulates, and may be, e.g., an agonist,antagonist, activator, stimulator, suppressor, or inhibitor.

The term “agonist” refers to an agent that mimics or up-regulates (e.g.,potentiates or supplements) the bioactivity of a protein, e.g.,polypeptide X. An agonist may be a wild-type protein or derivativethereof having at least one bioactivity of the wild-type protein. Anagonist may also be a compound that upregulates expression of a gene orwhich increases at least one bioactivity of a protein. An agonist mayalso be a compound which increases the interaction of a polypeptide withanother molecule, e.g., a target peptide or nucleic acid.

The term “antagonist” refers to an agent that downregulates (e.g.,suppresses or inhibits) at least one bioactivity of a protein. Anantagonist may be a compound which inhibits or decreases the interactionbetween a protein and another molecule, e.g., a target peptide or enzymesubstrate. An antagonist may also be a compound that downregulatesexpression of a gene or which reduces the amount of expressed proteinpresent.

The term “prophylactic” or “therapeutic” treatment refers toadministration to the subject of one or more of the subjectcompositions. If it is administered prior to clinical manifestation ofthe unwanted condition (e.g., disease or other unwanted state of thehost animal) then the treatment is prophylactic, i.e., it protects thehost against developing the unwanted condition, whereas if administeredafter manifestation of the unwanted condition, the treatment istherapeutic (i.e., it is intended to diminish, ameliorate or maintainthe existing unwanted condition or side effects therefrom).

The term “therapeutic effect” refers to a local or systemic effect inanimals, particularly mammals, and more particularly humans caused by apharmacologically active substance. The term thus means any substanceintended for use in the diagnosis, cure, mitigation, treatment orprevention of disease or in the enhancement of desirable physical ormental development and conditions in an animal or human. The phrase“therapeutically-effective amount” means that amount of such a substancethat produces some desired local or systemic effect at a reasonablebenefit/risk ratio applicable to any treatment. In certain embodiments,a therapeutically-effective amount of a compound will depend on itstherapeutic index, solubility, and the like. For example, certaincompounds discovered by the methods of the present invention may beadministered in a sufficient amount to produce a reasonable benefit/riskratio applicable to such treatment.

A probe that is “labeled” is detectable, either directly or indirectly,by spectroscopic, photochemical, biochemical, immunochemical, isotopic,or chemical means. For example, useful labels include ³²P, ³³P, ³⁵S,¹⁴C, ³H, ¹²⁵I, stable isotopes, fluorescent dyes and fluorettes (Rozinovand Nolan (1998) Chem. Biol 5:713-728; Molecular Probes, Inc. (2003)Catalogue, Molecular Probes, Eugene Oreg.), electron-dense reagents,enzymes and/or substrates, e.g., as used in enzyme-linked immunoassaysas with those using alkaline phosphatase or horse radish peroxidase. Thelabel or detectable moiety is typically bound, either covalently,through a linker or chemical bound, or through ionic, van der Waals orhydrogen bonds to the molecule to be detected. “Radiolabeled” refers toa compound to which a radioisotope has been attached through covalent ornon-covalent means. A “fluorophore” is a compound or moiety that absorbsradiant energy of one wavelength and emits radiant energy of a second,longer wavelength.

A “labeled nucleic acid probe or oligonucleotide” is one that is bound,either covalently, through a linker or a chemical bond, ornoncovalently, through ionic, van der Waals, electrostatic, or hydrogenbonds to a label such that the presence of the probe can be detected bydetecting the presence of the label bound to the probe. The probes arepreferably directly labeled as with isotopes, chromophores,fluorophores, chromogens, or indirectly labeled such as with biotin towhich a streptavidin complex or avidin complex can later bind.

A “nucleic acid probe” is a nucleic acid capable of binding to a targetnucleic acid of complementary sequence, usually through complementarybase pairing, e.g., through hydrogen bond formation. A probe may includenatural, e.g., A, G, C, or T, or modified bases, e.g., 7-deazaguanosine,inosine, etc. The bases in a probe can be joined by a linkage other thana phosphodiester bond. Probes can be peptide nucleic acids in which theconstituent bases are joined by peptide bonds rather than phosphodiesterlinkages. It will be understood by one of skill in the art that probesmay bind target sequences lacking complete complementarity with theprobe sequence depending upon the stringency of the hybridizationconditions.

“Small molecule” is defined as a molecule with a molecular weight thatis less than 10 kD, typically less than 2 kD, and preferably less than 1KD. Small molecules include, but are not limited to, inorganicmolecules, organic molecules, organic molecules containing an inorganiccomponent, molecules comprising a radioactive atom, synthetic molecules,peptide mimetics; and antibody mimetics. As a therapeutic, a smallmolecule may be more permeable to cells, less susceptible todegradation, and less apt to elicit an immune response than largemolecules. Small molecule toxins are described, see, e.g., U.S. Pat. No.6,326,482 issued to Stewart, et al.

A small molecule refers to a composition, which has a molecular weightof less than about 1000 kDa.

III. Identification of Transcriptional Targets and TranscriptionalNetworks

One aspect of the invention provides a method of determining which genesfrom a subset of genes are regulated by a transcriptional regulator in acell, the method comprising (a) selectively isolating chromatin from acell which expresses the transcriptional regulator to generate isolatedchromatin; (b) selectively isolating chromatin fragments from theisolated chromatin to generate bound chromatin fragments, wherein thebound chromatin fragments are bound by the transcriptional regulator;(c) amplifying both the bound chromatin fragments to generate amplifiedchromatin fragments and the isolated chromatin to generate amplifiedcontrol chromatin; (d) hybridizing the amplified control chromatin andthe amplified chromatin fragments to a DNA microarray, wherein the DNAmicroarray comprises (1) at least 10,000 experimental spots, eachexperimental spot comprising an experimental DNA, each experimental DNAcomprising a promoter region from a gene in the subset; and (2) at least100 control spots, each control spot comprising a control DNA, eachcontrol DNA comprising a non-promoter region; and (e) determining andcomparing a hybridization signal at each of the spots on the microarraybetween those generated by (1) the amplified control chromatin; and (2)the amplified chromatin fragments; wherein a gene in the subset is saidto be regulated by the transcriptional regulator in the cell if a spotcomprising a promoter region of said gene displays a higher level ofhybridization by the amplified chromatin fragments than by the amplifiedcontrol chromatin.

Methods of isolating chromatin, and in particular chromatin fragmentsthat are bound by a transcriptional regulator, may be carried out by anymethod known to one skilled in the art, including by cross-linking thetranscriptional regulator to chromatin, fragmenting the chromatin, andimmunoprecipitating the transcriptional regulators.

In a preferred embodiment, the chromatin fragments bound by thetranscriptional regulator are isolated using chromatinimmunoprecipitation (ChIP). Briefly, this technique involves the use ofa specific antibody to immunoprecipitate chromatin complexes comprisingthe corresponding antigen i.e. the transcriptional regulator, andexamination of the nucleotide sequences present in theimmunoprecipitate. Immunoprecipitation of a particular sequence by theantibody is indicative of interaction of the antigen with that sequence.See, for example, O'Neill et al. in Methods in Enzymology, Vol. 274,Academic Press, San Diego, 1999, pp. 189-197; Kuo et al. (1999) Method19:425-433; and Ausubel et al., supra, Chapter 21.

In one embodiment, the chromatin immunoprecipitation technique isapplied as follows. Cells which express the transcriptional regulator ofinterest, such as a native transcriptional regulator or a recombinanttranscriptional regulator, are treated with an agent that crosslinks thetranscriptional regulator to chromatin if that transcriptional regulatoris stably bound to it. In one embodiment of the methods describedherein, the crosslinking is formaldehyde crosslinking (Solomon, M. J.and Varshavsky, A., Proc. Natl. Sci. USA 82:6470-6474; Orlando, V.,TIBS, 25:99-104). UV light may also be used (Pashev et al. TrendsBiochem Sci. 1991; 16(9):323-6; Zhang L et al. Biochem Biophys ResCommun. 2004; 322(3):705-11).

Subsequent to crosslinking, cellular nucleic acid is isolated, shearedsuch as by sonication and incubated in the presence of an antibodydirected against the transcriptional regulator. Antibody-antigencomplexes are precipitated, crosslinks are reversed (for example,formaldehyde-induced DNA-protein crosslinks can be reversed by heating)so that the sequence content of the immunoprecipitated DNA is tested forthe presence of a specific sequence, for example, promoter regions. Theantibody may bind directly to an epitope on the transcriptionalregulator or it may bind to a tag on the regulator, such as a myc tagwhen used with an anti-Myc antibody (Santa Cruz Biotechnology, sc-764).

In yet another embodiment, a non-antibody agent with affinity for thetranscriptional regulator or for a tag used to it is used in place ofthe antibody. For example, if the transcriptional regulator comprises anaffinity tag, such as a six-histidine tag, complexes may be isolated byaffinity chromatography to nickel-containing sepharose. Additionalvariations on ChIP methods within the scope of the invention may befound in Kurdistani et al. Methods. 2003 31(1):90-5; O'Neill et al.Methods. 2003, 31(1):76-82; Spencer et al., Methods. 2003; 31(1):67-75;and Orlando et al. Methods 11: 205-214 (1997).

In an alternate embodiment of the methods described herein foridentifying genes regulated by a transcriptional regulator, amplifiedchromatin fragments from a control immunoprecipitation reaction are usedin place of the isolated chromatin as a control. For example, anantibody that does not react with the transcription factor being testedmay be used in a chromatin IP procedure to isolate control chromatin,which can then be compared to the chromatin isolated using an antibodythat does react with the transcriptional regulator. In preferredembodiments, the antibody that does not react with the transcriptionfactor being tested also does not react with other transcriptionalregulators or DNA binding proteins.

In one embodiment, the amplified control chromatin and the amplifiedchromatin fragments are generated from their corresponding template DNAusing ligation-mediated polymerase chain reaction (LM-PCR) (e.g., seeCurrent Protocols in Molecular Biology, Ausubel, F. M. et al., eds.1991, and U.S. Application No. 2003/0143599, the teachings of which areincorporated herein by reference) in their entirety. In specificembodiments, LM-PCR comprises fluorescently labeling amplified DNA byincluding fluorescently-tagged nucleotides in the LM-PCR reaction.Additional variations for manipulating and examining chromatin usingmicroarrays have described in U.S. Pat. No. 6,410,243, the teachings ofwhich are incorporated herein by reference.

In one embodiment, the labelled or unlabeled probes are hybridized toDNA microarray, such as is described in U.S. Pat. No. 6,410,243.Microarrays, also called “biochips” or “arrays” are miniaturized devicestypically with dimensions in the micrometer to millimeter range forperforming chemical and biochemical reactions and are particularlysuited for embodiments of the invention. Arrays may be constructed viamicroelectronic and/or microfabrication using essentially any and alltechniques known and available in the semiconductor industry and/or inthe biochemistry industry, provided only that such techniques areamenable to and compatible with the deposition and screening ofpolynucleotide sequences. Microarrays are particularly desirable fortheir virtues of high sample throughput and low cost for generatingprofiles and other data. Additional variations for manipulating andexamining chromatin using microarrays have described in U.S. Pat. No.6,410,243, the teachings of which are incorporated herein by reference.

In one embodiment of the methods described, amplified control chromatinand the amplified chromatin fragments are hybridized to a DNA microarraythat includes experimental spots that represent all or a subset (e.g., achromosome or chromosomes) of the genome. The fluorescent intensity ofeach experimental spot on the microarray from the amplified chromatinfragments relative to the amplified control chromatin indicates whetherthe protein of interest is bound to the DNA region located at thatparticular spot. Hence, the methods described herein allow the detectionof protein-DNA interactions across an entire genome.

In some embodiments of the methods described herein, the promoter regionof a gene comprises from at least 700 bp upstream to at least 200 bpdownstream of the transcriptional start site of the gene. In someembodiments, the promoter region comprises at least about 30, 40, 50, or60 nucleotides in length. In specific embodiments, the promoter regionof a gene as found on the spots of the microarray comprises a sequenceof at least 30 nucleotides whose sequence is identical to a regionstretching from 3 kb upstream to 1 kb downstream of the transcriptionalstart site of said gene. In some embodiments, the DNA microarrayincludes control spots of non-promoter DNA. In specific embodiment, thenon-promoter region comprises an open reading frame. In preferredembodiments, the non-promoter regions comprise genomic regions which arenot bound by transcriptional regulators, and preferably which are notbound by the transcriptional regulator being tested. In someembodiments, not all the experimental spots or the control spotscomprise experimental DNA or control DNA, respectively. Furthermore, insome specific embodiments some spots comprise control DNA whichcomprises promoter DNA. One skilled in the art may determine the numberof experimental or control spots for a given application.

In some embodiments of the methods described herein, the level ofhybridization of the amplified chromatin fragments to each experimentalspot is normalized by the level of hybridization of the amplifiedchromatin fragments to the control spots. In specific embodiments, thenormalization is performed by subtracting the mean level ofhybridization of the amplified chromatin fragments to the control spotsfrom the level of hybridization of the amplified chromatin fragments ateach experimental spot.

Methods of analyzing data from microarrays are well-described in theart, including in DNA Microarrays: A Molecular Cloning Manual, Ed byBowtel and Sambrook (Cold Spring Harbor Laboratory Press, 2002);Microarrays for an Integrative Genomics by Kohana (MIT Press, 2002); ABiologist's Guide to Analysis of DNA Microarray Data, by Knudsen (Wiley,John & Sons, Incorporated, 2002); and DNA Microarrays: A PracticalApproach, Vol. 205 by Schema (Oxford University Press, 1999); andMethods of Microarray Data Analysis II, ed by Lin et al. (KluwerAcademic Publishers, 2002), hereby incorporated by reference in theirentirety.

In some embodiments of any of the methods described herein, thetranscriptional regulator is native to the cell. By native it is meantthat the transcriptional regulator naturally occurs in the cell. Inother embodiments, the transcriptional regulator is a recombinanttranscriptional regulator. In some embodiments, the transcriptionalregulator originates from a species which is different from that of thecell. In some embodiments, the transcriptional regulator is a viraltranscriptional regulator. In such embodiments, a cell may be contactedwith a virus and chromatin extracted from the infected cell afterallowing sufficient time for the viral proteins to be expressed. In someembodiments, recombinant transcriptional regulators have missensemutations, truncations, or inserted sequences or entire domains fromother naturally occurring proteins. A tagged recombinant transcriptionalregulator may be used in some embodiments the methods of the presentinvention as the tag may facilitate the immunoprecipitation of theregulator.

In certain embodiments of the invention, transcriptional regulatorscomprise specific transcription factors, coactivators, corepressors orcomplexes thereof. Transcription factors bind to specific cognate DNAelements such as promoters, enhancers and silencer elements, and areresponsible for regulating gene expression. Transcription factors may beactivators of transcription, repressors of transcription or both,depending on the cellular context. Transcription factors may belong toany class or type of known or identified transcription factor. Examplesof known families or structurally-related transcription factors includehelix-loop-helix, leucine zipper, zinc finger, ring finger, and hormonereceptors. Transcription factors may also be selected based upon theirknown association with a disease or the regulation of one or more genes.For example, transcription factors such as c-myc, Rel/Nf-kB, neuroD,c-fos, c-jun, and E2F may be targeted. Antibodies directed to anytranscriptional coactivator or corepressor may also be used according tothe invention. Examples of specific coactivators include CBP, CTIIA, andSRA, while specific examples of corepressors include the mSin3 proteins,MITR, and LEUNIG. Furthermore, the genes regulated by proteinsassociated with transcriptional complexes, such as the histoneacetylases (HATs) and histone deacetylases (HDACs), may also dedetermined using the methods described herein.

In one embodiment of the methods described herein, the cell is a primarycell. Primary cells are directly isolated from an organism and haveundergone minimum passaging in vitro, and thus maintain most of thephenotypic characteristics of cells in the organism. In a specificembodiment, the primary cells are primary cells that have doubled lessthan 10 times ex vivo. In some embodiments, the cell is derived fromtransplant grade tissue or freshly isolated tissue. The cell type usedin the assays described herein may be any cell type. The cell may beeukaryotic or prokaryotic, from a metazoan or from a single-celledorganism such as yeast. In some preferred embodiments the cell is amammalian cell, such as a cell from a rodent, a primate or a human. Thecell may be a wild-type cell or a cell that has been geneticallymodified by recombinant means or by exposure to mutagens. The cell maybe a transformed cell or an immortalized cell. In some embodiments, thecell is from an organism afflicted by a disease. In some embodiments,the cell comprises a genetic mutation that results in disease, such asin a hyperplastic condition.

In some embodiments, the cell is derived from transplant-grade tissue orfreshly isolated tissue. In some embodiments, the cell is derived from atissue biopsy, such as from a subject afflicted with, or suspected ofbeing afflicted with, a disorder. In another embodiment, the cell isisolated from a bodily fluid or bodily secretion, including serum,plasma, saliva, tears, sweat, semen, amniotic fluid, vaginal secretions,nasal secretions, synovial fluid, spinal fluid, phlegm, bronchoalveolarlavage fluid, blister fluid, pus, stool and intracranial fluid. The cellmay be a live cell or a cell that has been preserved, such as bytreatment with formalin, B5, Zenker's fixatives, Lugol's solution,Carnoy's Fixative, F13 fixative, or other preservatives, or a cell thathas been preserved by freezing.

In some embodiments of the methods described herein, the cell has beentreated with an agent, such as compound or a drug, prior to isolation ofchromatin. Some preferred agents include those which bind to or regulatethe expression of transcriptional regulators. In some embodiments, thegenes that are regulated by a given transcriptional regulator aredetermined both in a cell that is contacted with an agent and in a cellthat is not contacted with the agent, or that is contacted with adifferent amount of the agent. Such methods may be used to identifycompounds that alter the types of genes and/or the extent to which atranscriptional regulators controls transcription of those genes.Furthermore, such approaches may be used to screen for agents whichalter the activity, specificity or expression of a transcriptionalregulator.

In some embodiment of the methods described herein for identifying genesregulated by a transcriptional regulator, a higher level ofhybridization by the amplified chromatin fragments than by the amplifiedcontrol chromatin comprises at least a two-fold higher level ofhybridization. The threshold for what constitutes a higher level ofhybridization, may be adjusted by one skilled in the art for theparticular application. Higher levels of hybridization are expected toyield a smaller target size but with higher certainty that a given geneabove that threshold is regulated by the transcriptional regulator inthat cell in vivo.

In other embodiments of the methods described herein for identifyinggenes regulated by a transcriptional regulator, the transcriptionalregulator is a basal transcription factor or a component of the basaltranscription machinery. In specific embodiments, components of thebasal transcription machinery comprise RNA polymerases, including poII,poIII and poIIII, TBP, NTF-1 and Sp1 and any other component of TFIID,including, for example, the TAFs (e.g. TAF250, TAF150, TAF135, TAF95,TAF80, TAF55, TAF31, TAF28, and TAF20), or any other component of apolymerase holoenzyme.

Another aspect of the invention provides a method of identifyingtranscriptionally active genes that are regulated by a transcriptionalregulator in a cell. The method comprises determining what genes areregulated by the transcriptional regulator and determining which onesare transcriptionally active in the cell. In one embodiment, a set ofgenes which are transcriptionally active is the set of genes whosepromoters are bound by an RNA polymerase, such as RNA polymerase II, orby a member of the basal transcription machinery. Alternatively, geneswhich are transcriptionally active may be identified using othertechniques know in the art. For example, mRNA from a cell whichexpresses the transcriptional regulator can be collected and examined ona DNA microarray which comprises coding sequences in order to determinewhich genes are being transcribed.

In one embodiment, the invention provides a method of identifyingtranscriptionally active genes that are regulated by a transcriptionalregulator in a cell, the method comprising (a) selectively isolatingchromatin from a tissue; (b) identifying promoter regions from thechromatin that are bound by the transcriptional regulator; (c)identifying promoter regions from the chromatin that are bound by amember of the basal transcriptional machinery; and (d) comparing thepromoter regions identified in steps (b) and (c) to determineoverlapping genes, wherein the overlapping genes are transcriptionallyactive genes regulated by the transcriptional regulator.

In a related aspect, the invention provides methods to determine if atranscriptional regulator is a global transcription regulator. Onemethod comprises estimating if a transcriptional regulator is a globaltranscriptional regulator, the method comprising (a) selectivelyisolating chromatin from a tissue; (b) identifying promoter regions fromthe chromatin which are bound by a candidate global transcriptionalregulator; (c) identifying promoter regions from the chromatin which arebound by a member of the basal transcriptional machinery; and (d)comparing the promoter regions identified in steps (b) and (c) todetermine the ratio between (i) the number of promoter regions bound byboth the candidate global transcriptional regulator and the member ofthe basal transcriptional machinery; and (ii) the number of promoterregions bound by the member of the basal transcriptional machinerywherein a transcriptional regulator is a global transcriptionalregulator when the ratio is greater than 0.2.

In a preferred embodiment of the methods described above, steps (b) and(c) are performed using a DNA microarray. In a specific embodiment, theDNA microarray comprises (i) at least 10,000 experimental spots, eachexperimental spot comprising an experimental DNA, each experimental DNAcomprising a promoter region from a human gene in the subset; and (ii)at least 100 control spots, each control spot comprising a control DNA,each control DNA comprising a non-promoter region. Any type ofmicroarray or array may be used.

In one embodiment of the methods described above, the member of thetranscriptional machinery is an RNA polymerase, such as RNA polymeraseII, a TATA-binding protein, or any other component of TFIID, including,for example, the TAFs (e.g. TAF250, TAF150, TAF135, TAF95, TAF80, TAF55,TAF31, TAF28, and TAF20).

Another aspect of the invention provides methods of identifyingregulatory networks, or pathways, in a cell. The methods provided by theinvention allow the identification of the regulatory motifs, such asthose shown in FIG. 2B. A regulatory pathway can include, for example, apathway that controls a cellular function under a specific condition. Aregulatory pathway controls a cellular function by, for example,altering the activity of a system component or the activity of abiochemical, gene expression or other type of pathway. Alterations inactivity include, for example, inducing a change in the expression,activity, or physical interactions of a pathway component under aspecific condition. Specific examples of regulatory pathways include apathway that activates a cellular function in response to anenvironmental stimulus of a biochemical system, such as the inhibitionof cell differentiation in response to the presence of a cell growthsignal and the activation of galactose import and catalysis in responseto the presence of galactose and the absence of repressing sugars. Theterm “component” when used in reference to a network or pathway isintended to mean a molecular constituent of the biochemical system,network or pathway, such as, for example, a polypeptide, nucleic acid,other macromolecule or other biological molecule.

In one aspect, the invention provides a method of identifying atranscriptional regulatory network in a cell, the method comprisingdetermining if a transcriptional regulator regulates additionaltranscriptional regulators in the cell, such as by using any of themethods described herein, wherein a transcriptional regulatory networkis identified if at least one additional transcriptional regulator isregulated by the transcriptional regulator.

Another aspect of the invention provides a method of identifying atranscriptional regulatory network in a cell, the method comprisingdetermining if a transcriptional regulator regulates (i) its ownpromoter; or (ii) a promoter from a plurality of transcriptionalregulators; such as by using any of the methods described herein,wherein the experimental DNA comprises (a) a promoter from thetranscriptional regulator; and (b) promoters from the plurality oftranscriptional regulators; wherein a transcriptional regulatory networkis identified if the transcriptional regulator regulates itself or if itregulates at least one of the plurality of transcriptional regulators.

Yet another aspect of the invention provides a method of identifyingtranscriptional regulatory networks in a cell, the method comprising (a)determining, by repeating one of the methods described herein for eachof a plurality of transcriptional regulators, the genes in a subsetwhich are regulated by each of the plurality of transcriptionalregulators, wherein the experimental DNA comprises promoter regions foreach of the plurality of transcriptional regulators; (b) determining ifany one of the plurality of transcriptional regulators are regulated byat least one of the plurality of transcriptional regulators; wherein atranscriptional regulatory network is identified if any one of theplurality of transcriptional regulators is regulated by at least one ofthe plurality of transcriptional regulators.

Specific embodiments of the methods for identifying regulatory networksdescribed herein further comprise determining if any of the genesregulated by one of the plurality of transcriptional regulators is alsoa target of any of the other transcriptional regulators

The invention further provides algorithms for the identification ofregulatory motifs, which may be used in conjuction with any of themethods provided herein, such as the methods for identifying the genesregulated by a transcriptional regulator. In a specific embodiment, twodata matrices are created. The overall matrix D consists of binaryentries Dij, where a 1 indicates binding of regulatorj to intergenicregion i, a 0 indicates no binding event. The regulator matrix R is asubset of D, containing only the rows corresponding to the intergenicregion assigned to each regulator, in the same order as the columns ofregulators. The analyses may be performed using Matlab® software. Thealgorithms to find each motif are described as follows:

Autoregulatory motif: Find each non-zero entry on the diagonal of R.

Feedforward loop: For each master regulator (column of R), find non-zeroentries, which correspond to regulators bound. For each masterregulator/secondary regulator pair, find all rows in D bound by bothregulators.

Multi-component loop: For each regulator (column of R), find theregulators to which it binds. For each of these, find the regulators itbinds. If any of these are the original regulator, you have amulti-component loop of two. For all others, find regulators to whichthey bind. If any of these are the original, you have a multicomponentloop of three. Repeat to find larger loops.

Single input module: Find the intergenic regions bound by only oneregulator. That is, take the subset of rows of D such that the sum ofeach row is 1. Then for each regulator (column), find non-zero entries.Each set (greater than three intergenic regions) is a SIM.

Multi-input module: Find the intergenic regions bound by more than oneregulator. That is, take the subset of rows of D such that the sum ofeach row is greater than 1. Then, for each row, find any other row boundby the same regulators. The collection of rows bound by the sameregulators correspond to a MIM. Once a row is assigned to a MIM, removeit from further analysis.

Regulator chain: For each regulator (column of R), use a recursivealgorithm to find chains of all lengths. That is, for each regulatorwhose promoter is bound by the regulator before it in the chain, findthe regulator promoters to which it binds. Repeat until the chain ends.There are three possible ways to end a chain: a regulator that does notbind to the promoter of any other regulator, a regulator that binds toits own promoter, or one that binds to the promoter of another regulatorearlier in the chain.

In one preferred embodiment of any of the methods described herein suchas the methods for identifying regulatory networks, the experimental DNAin the microarray comprises promoter regions from additionaltranscriptional regulators or from genes suspected to encodetranscriptional regulators. Such microarray enables one skilled in theart to identify the components of a regulatory pathway. For example,starting with one transcriptional regulator, a subset of the genes itregulates are identified using any method, such as those describedherein. If one identified gene is itself a second transcriptionalregulator or is suspected to encode a transcriptional regulator, thenthe subset of genes the second transcriptional regulator regulates isidentified, and so on. Furthermore, the subset of genes that the firstand second transcriptional regulators regulate can be compared todetermine of any genes are found in both subsets. If so, then afeed-forward motif, a unit of a regulatory network, has been identified.Likewise, if the second transcriptional regulator is found to regulatethe first one, then a feedback loop has been identified.

4. Development of a Therapeutic to Treat or Prevent Disorders

One aspect of the invention provides methods of identifying targets forthe development of therapeutics. One aspect of the invention provides amethod of identifying at least one target gene for the development of atherapeutic to treat or prevent a disorder in a subject, wherein atleast one form of the disorder is caused by an altered activity in atranscriptional regulator or in a suspected transcriptional regulator,the method comprising (a) identifying the genes regulated by thetranscriptional regulator in a cell; (b) determining if thetranscriptional regulator is a broad-acting transcriptional regulator ora narrow-acting transcriptional regulator, wherein if thetranscriptional regulator is a broad acting transcriptional regulatorthen the transcriptional regulator is a target gene for the developmentof a therapeutic, and wherein if the transcriptional regulator is anarrow acting transcriptional regulator then (i) determining if at leastone gene regulated by the transcriptional regulator is likely causativein the disorder, wherein a gene that is likely causative in the disorderis a target gene for the development of a therapeutic; and (ii)reiterating steps (a) and (b) for at least one gene that is regulated bythe transcriptional regulator in the cell and that either (1) encodes atranscriptional regulator or (2) is suspected to encode atranscriptional regulator, with the modification that thetranscriptional regulator of steps (a) and (b) is said gene, therebyidentifying at least one target gene for the development of atherapeutic to treat or prevent a disorder in the subject.

In some embodiments of the methods for identifying a target gene for thedevelopment of a therapeutic, the genes regulated by the transcriptionalregulator in the cell are identified using chromosome-wide locationanalysis, analysis of mRNA transcripts in a cell that expresses thetranscriptional regulator, or by using any of the methods providedherein for the identification of the genes that are regulated by atranscriptional regulator. Some methods may comprise the use of DNAmicroarray or DNA arrays, such as those described in Gabrielson et al.,Obesity Research, 8(5), 374-384 (2000).

In some embodiments of the methods described herein for identifying atarget gene for the development of a therapeutic, the transcriptionalregulator is a master regulatory gene. In specific embodiments, themaster regulatory gene is SOX1-18, OCT6, PAX3, Myocardin, GATA1-6,TCF1/HNF1A, HNF4A, HNF6, NGN3, C/EBP, FOXA1-3, IPF1, GATA, HNF3, NKX2.1,CDX, FTF/NR5A2, C/EBPbeta, SCL1, SKIN1, or a member of the neurogenin,LK, LMO, SOX, OCT, PAX, GATA or MyoD family of transcription factors.

In some embodiments of the methods described herein, the transcriptionalregulator is PAX3, EGR-1, EGR-2, OCT6, a SOX family member, a GATAfamily member, a PAX family member, an OCT family member, RFX5, WHN,GATA1, VDR, CRX, CBP, MeCP2, AML1, p53, PLZF, PML, Rb, WT1, NR3C2, GCCR,PPARgamma, SIM1, HNF1alpha, HNF1beta, HNF4alpha, PDX1, MAFA, FOXA2, orNEUROD1.

A transcriptional regulator whose altered activity can lead to diseasemight be expressed in multiple, or all tissues of an organism, such thatany of multiple cell types may be used in identifying a therapeutic. Insome embodiments of the methods described herein for identifying atarget gene for the development of a therapeutic, the cell is derivedfrom a tissue whose function is impaired in the disorder. For example, apancreatic cell may be used for diabetes, a cardiac muscle cells formyocardial infarction, or neurons for Alzheimer's disease.

In specific embodiments of the methods described herein for identifyinga target gene for the development of a therapeutic, the broad actinggene regulates at least about 1%, 2% or more preferably at least about2.5% of the genes in the cell, and the narrow acting gene regulates lessthan about 1%, 2% or 2.5% of the genes in the cell.

In specific embodiments of the methods described herein, a gene issuspected to encode a transcriptional regulator if it shares at leastabout 30%, 40% or 50% amino acid sequence identity within at least theDNA binding domain of a transcriptional regulator. DNA binding domainsand methods of performing nucleic acids and polypeptide sequencealignments are well-known in the art. Optimal alignment of sequences forcomparison may be conducted by the local homology algorithm of Smith andWaterman, Adv. Appl. Math. 2: 482 (1981); by the homology alignmentalgorithm of Needleman and Wunsch, J. Mol. Biol. 48: 443 (1970); by thesearch for similarity method of Pearson and Lipman, Proc. Natl. Acad.Sci. 8: 2444 (1988); by computerized implementations of thesealgorithms, including, but not limited to: CLUSTAL in the PC/Geneprogram by Intelligenetics, Mountain View, Calif., GAP, BESTFIT, BLAST,FASTA, and TFASTA in the Wisconsin Genetics Software Package, GeneticsComputer Group (GCG), 7 Science Dr., Madison, Wis., USA; the CLUSTALprogram is well described by Higgins and Sharp, Gene, 73: 237-244, 1988;Higgins and Sharp, CABIOS:11-13, 1989; Corpet, et al., Nucleic AcidsResearch, 16:881-90, 1988; Huang, et al., Computer Applications in theBiosciences 8:1-7,1992; and Pearson, et al., Methods in MolecularBiology 24:7-331,1994.

In some specific embodiments of the methods described herein foridentifying a target gene for the development of a therapeutic, the generegulated by the transcriptional regulator is said to be likelycausative of the disorder if a mutation in said gene results in at leastone phenotype or symptom associated with the disorder. In anotherspecific embodiment, the gene regulated by the transcriptional regulatoris said to be likely causative of the disorder when the gene encodes anenzyme or signaling molecule which functions in a pathway that isimpaired in the disorder. For example, if the disease is type IIdiabetes, a disorder characterized by hyperglycemia, then a generegulated by the transcriptional regulator which encodes a sugartransporter, an enzyme involved in catalyzing a step of glycolysis orgluconeogenesis, or a gene which regulates insulin production, secretionor signaling is said to be likely causative or the disorder. In anotherspecific embodiment, the gene regulated by the transcriptional regulatoris said to be likely causative of the disorder if a mutant allele of thegene is genetically linked to a “susceptibility locus” for at least oneform of the disease. A “susceptibility locus” for a particular diseaseis a sequence or gene locus implicated in the initiation or progressionof the disease. The susceptibility locus can be, for example, a gene ora microsatellite repeat, as identified by a microsatellite marker, orcan be identified by a defined single nucleotide polymorphism.Generally, susceptibility genes implicated in specific diseases andtheir loci can be found in scientific publications, but may also bedetermined experimentally.

In some embodiments of the methods described herein for identifying atarget gene for the development of a therapeutic, the altered activityin the transcriptional regulator comprises at least one of thefollowing: (a) an alteration in the binding affinity of thetranscriptional regulator to DNA; (b) an alteration in the ability ofthe transcriptional regulator to bind to RNA polymerase, to an RNApolymerase holoenzyme, or to a second transcriptional regulator; (c) analteration in the binding affinity of the transcriptional regulator to aligand; (d) an alteration in expression level or expression pattern ofthe transcriptional regulator; or (e) an alteration in an ability of thetranscriptional regulator to form homomultimers or heteromultimers.

In some embodiments of the methods described herein, the cell comprisesa mutant form of the transcriptional regulator. A preferred mutant formof the transcriptional regulator is one that causes the disease to whichthe therapeutic is sought. Such embodiments are particularly preferredwhen a mutant transcriptional regulator which causes at least one formof the disease has an altered target specificity and thus the genes itregulates, or the extent to which it regulates their transcription, isaltered when compared to the non-mutant form of the transcriptionalregulator. Such embodiments may allow the identification of therapeutictargets which might not have been identified if a wild-type form of thetranscriptional regulator had been used. Mutations in the DNA bindingdomain, for example, may alter the target specificity of atranscriptional regulator by altering its affinity for various DNAbinding sequences.

It is well-known to one skilled in the art that mutations in atranscriptional regulator may result in a hypomorphic, hypermorphic orneomorphic phenotype. Mutations may generally reduce the activity of atranscriptional regulator, may generally increase it activity, or mayconfer novel properties, such as altering the range of targets orturning an activator into a repressor or vice versa. In any methodsdescribed herein, and in particular those for identifying thetherapeutics, a cell expressing a transcriptional regulator having anyof these changes in activity may be used.

The methods described herein may be applied to any disorder for which atranscriptional regulator has been implicated. Examples of diseases andtranscriptional regulators which cause them may be found in thescientific and medical literature by one skilled in the art, includingin Medical Genetics, L. V. Jorde et al., Elsevier Science 2003, andPrinciples of Internal Medicine, 15th edition, ed by Braunwald et al.,McGraw-Hill, 2001; American Medical Association Complete MedicalEncyclopedia (Random House, Incorporated, 2003); and The Mosby MedicalEncyclopedia, ed by Glanze (Plume, 1991). In some embodiments, thedisorder is characterized by impaired function of at least one of thefollowing: brain, spinal cord, heart, arteries, esophagus, stomach,small intestine, large intestine, liver, pancreas, lungs, kidney,urinary tract, ovaries, breasts, uterus, testis, penis, colon, prostate,bone, muscle, cartilage, thyroid gland, adrenal gland, pituitary, bonemarrow, blood, thymus, spleen, lymph nodes, skin, eye, ear, nose, teethor tongue.

In some embodiments of the methods described herein for identifying atarget gene for the development of a therapeutic, the subject is amammal. In preferred embodiments, the subject is a human. In someembodiments of the methods described herein for identifying a targetgene for the development of a therapeutic, the therapeutic comprises asmall molecule drug, an antisense nucleic acid, an antibody, a peptide,a ligand, a fatty acid, a hormone or a metabolite.

Antisense nucleic acids acting by RNAi include oligonucleotides whichspecifically hybridize (e.g., bind) under cellular conditions with agene sequence, such as at the cellular mRNA and/or genomic DNA level, soas to inhibit expression of that gene, e.g., by inhibiting transcriptionand/or translation. The binding may be by conventional base paircomplementarily, or, for example, in the case of binding to DNAduplexes, through specific interactions in the major groove of thedouble helix. Preferred antisense nucleic acid comprise siRNA, shRNAs,or any other form of double stranded RNA molecule. Antisense nucleicacids may be chemically modified, such as to increase their in vivostability.

RNAi is a process of sequence-specific post-transcriptional generepression which can occur in eukaryotic cells. In general, this processinvolves degradation of an mRNA of a particular sequence induced bydouble-stranded RNA (dsRNA) that is homologous to that sequence. Forexample, the expression of a long dsRNA corresponding to the sequence ofa particular single-stranded mRNA (ss mRNA) will labilize that message,thereby “interfering” with expression of the corresponding gene.Accordingly, any selected gene may be repressed by introducing a dsRNAwhich corresponds to all or a substantial part of the mRNA for thatgene. It appears that when a long dsRNA is expressed, it is initiallyprocessed by a ribonuclease III into shorter dsRNA oligonucleotides ofin some instances as few as 21 to 22 base pairs in length. Furthermore,RNAi may be effected by introduction or expression of relatively shorthomologous dsRNAs. dsRNAs shorter than about 30 bases pairs arepreferred to effect gene repression by RNAi (see Hunter et al. (1975) JBiol Chem 250: 409-17; Manche et al. (1992) Mol Cell Biol 12: 5239-48;Minks et al. (1979) J Biol Chem 254: 10180-3; and Elbashir et al. (2001)Nature 411: 494-8).

Antibodies include whole antibodies, e.g., of any isotype (IgG, IgA,IgM, IgE, etc.), and includes fragments thereof which are alsospecifically reactive with a vertebrate, e.g., mammalian, protein.Antibodies may be fragmented using conventional techniques and thefragments screened for utility in the same manner as described above forwhole antibodies. Thus, the term includes segments ofproteolytically-cleaved or recombinantly-prepared portions of anantibody molecule that are capable of selectively reacting with acertain protein. Non-limiting examples of such proteolytic and/orrecombinant fragments include Fab, F(ab′)2, Fab′, Fv, and single chainantibodies (scFv) containing a V[L] and/or V[H] domain joined by apeptide linker. The scFv's may be covalently or non-covalently linked toform antibodies having two or more binding sites. The subject inventionincludes polyclonal, monoclonal, humanized, or other purifiedpreparations of antibodies and recombinant antibodies.

Peptidomimetic include compounds containing peptide-like structuralelements that is capable of mimicking the biological action (s) of anatural parent polypeptide.

Hormone include any one of a number of biochemical substances that areproduced by a certain cell or tissue and that cause a specificbiological change or activity to occur in another cell or tissue locatedelsewhere in the body.

Metabolites includes any substance produced by metabolism or by ametabolic process. “Metabolism”, as used herein, refers to the variouschemical reactions involved in the transformation of molecules orchemical compounds occurring in tissue and the cells therein.

Ligands include any substance which binds to a receptor protein. Aligand of a transcriptional regulator protein is a substance which bindsto the regulator protein, such as estrogen binding to a nuclear hormonereceptor. In a preferred embodiment, ligand binding of to atranscriptional regulator occurs with high affinity. The term ligandrefers to substances including, but not limited to, a natural ligand,whether isolated and/or purified, synthetic, and/or recombinant, ahomolog of a natural ligand (e.g., from another mammal). The term ligandencompasses substances which are inhibitors or promoters of receptoractivity, as well as substances which selectively bind receptors, butlack inhibitor or promoter activity.

Some aspects of the invention relate to the diagnosis of disease states.A “transcriptional fingerprint”, or listing of the genes, and optionallyto what extent, that are regulated by given a transcriptional regulatorcan be generated from healthy individuals and from those afflicted witha disorder. Comparison of the fingerprints between the two groups maydefine genes which are specific to one of the two groups, and thus serveas diagnostic for the risk that a patient is at risk, or is afflicted,with the disorder. In one embodiment, the transcriptional fingerprint ofHNF4a is used to diagnose type II diabetes. A biopsy of a subject'sliver or pancreas may provide the cells for such analysis.

In specific embodiments, the transcriptional fingerprint diseasediagnosis analysis is applied to transcriptional regulators which arecausative in a particular disease to diagnose the disease. This approachmay be coupled to allelic genotyping of the transcriptional regulatorgene in the subject. For example, genotyping of a subject's HNF4a mayuncover a novel allele. By using “transcriptional fingerprint” of HNF4ain tissue from that patient, one skilled in the art may determine whateffect that mutation has in HNF4a activity and thus diagnose type IIdiabetes.

5. Methods of Preventing/Treating Disease through Regulation of HNFs

Some aspects of the invention provide methods of treating or preventingdisease by regulating transcriptional regulator activity, particularlythat of the HNF family member. The invention provides a method oftreating or preventing type II diabetes in a subject, comprisingadministering to the subject a therapeutically effective amount of anagent that increases the global transcriptional activity of HNF4alpha.U.S. Pat. No. 5,849,485 describes methods and assays for the isolationof modulators of HNF-4a activity, hereby incorporated by reference.

The invention also provides a method of treating or preventing adisorder associated with low transcriptional activity of HNF4alpha in asubject, comprising administering to the subject a therapeuticallyeffective amount of an agent that increases the global transcriptionalactivity of HNF4alpha. In a related aspect, the invention provides amethod of treating or preventing a disorder associated with hightranscriptional activity of HNF4alpha in a subject, comprisingadministering to the subject a therapeutically effective amount of anagent that decreases the global transcriptional activity of HNF4alpha.

Yet another related aspect of the invention provides a method ofincreasing the global transcriptional activity in a liver or apancreatic cell comprising contacting the cell with an agent whichincreases the global transcriptional activity of HNF4alpha. Similarly,the invention provides a method of decreasing the global transcriptionalactivity in a liver or a pancreatic cell comprising contacting the cellwith an agent which decreases the global transcriptional activity ofHNF4alpha.

Applicants have identified genes that are transcriptionally regulated byHNF-1a, HNF4a and HNF6 in hepatocytes and pancreatic cells. Accordingly,the invention provides methods of regulating the expression level of anyof these genes in a cell or in a subject by contacting the cell oradministering to the subject and agent which modulates the expressionlevel or transcriptional regulatory activity of HNF transcriptionfactors.

The invention provides a method of regulating the expression level ofany one of the genes in FIG. 13 in a hepatocyte, the method comprisingcontacting the cell with an agent which regulates the transcriptionalactivity of HNF1alpha. Similarly, the invention also provides a methodof regulating the expression level of any one of the genes in FIG. 14 ina pancreatic cell, the method comprising contacting the cell with anagent which regulates the transcriptional activity of HNF1alpha.

The invention also provides a method of regulating the expression levelof any one of the genes in FIG. 16 in a hepatocyte, the methodcomprising contacting the cell with an agent which regulates thetranscriptional activity of HNF6. Similarly, the invention provides amethod of regulating the expression level of any one of the genes inFIG. 17 in a pancreatic cell, the method comprising contacting the cellwith an agent which regulates the transcriptional activity of HNF6.

The invention additionally provides a method of regulating theexpression level of any one of the genes in FIG. 18 in a hepatocyte, themethod comprising contacting the cell with an agent which regulates thetranscriptional activity of HNF4alpha. Similarly, the invention providesa method of regulating the expression level of any one of the genes inFIG. 19 in a pancreatic cell, the method comprising contacting the cellwith an agent which regulates the transcriptional activity of HNF4alpha.

Agents which modulate the transcriptional activity of HNF-4a, or anyother HNF family member, may be identified by screening compounds fortheir ability to increase the expression level, the DNA binding activityor the transcriptional promoting activity of HNF4a. One assay formatwhich can be used employs two genetic constructs. One is typically aplasmid that continuously expresses the transcriptional regulator ofinterest when transfected into an appropriate cell line. CV-1 cells aremost often used. The second is a plasmid which expresses a reporter,e.g., luciferase under control of the transcriptional regulator. Forexample, if a compound which acts as a ligand for HNF-4 is to beevaluated, one of the plasmids would be a construct that results inexpression of the HNF-4 receptor in an appropriate cell line, e.g., theCV-1 cells. The second would possess a promoter linked to the luciferasegene in which an HNF-4 response element is inserted. If the compound tobe tested is an agonist for the HNF-4 receptor, the ligand will complexwith the receptor and the resulting complex binds the response elementand initiates transcription of the luciferase gene. In time the cellsare lysed and a substrate for luciferase added. The resultingchemiluminescence is measured photometrically. Dose response curves areobtained and can be compared to the activity of known ligands. Otherreporters than luciferase can be used including CAT and other enzymes.

Viral constructs can be used to introduce the gene for the receptor andthe reporter. An usual viral vector is an adenovirus. For furtherdetails concerning this preferred assay, see U.S. Pat. No. 4,981,784issued Jan. 1, 1991 hereby incorporated by reference, and Evans et al.,WO88/03168 published on 5 May 1988, also incorporated by reference.

HNF-4a antagonists can be identified using this same basic “agonist”assay. A fixed amount of an antagonist is added to the cells withvarying amounts of test compound to generate a dose response curve. Ifthe compound is an antagonist, expression of luciferase is suppressed.

Additional methods for the isolation of agonists and antagonist of HNFtranscription factors are described in U.S. Pat. Nos. 6,187,533 and5,620,887. Additional U.S. patents describing methods to identify agentsthat modulate the activity of transcription factors include 5,804,374,and 5,298,429, and U.S. Patent Publication Nos. 2004/0033942A12003/0077664, 2003/0215829 and 2003/0039980. Any of the methodsdescribed herein may be easily adapted to identify agonists orantagonists of any one of the HNF transcriptional factors. U.S. Pat. No.6,303,653 describes modulators of HNF-4 activity.

Agonists and antagonists of HNF4a can also be designed based on theknown crystal structure of HNF4a complexed with an endogenous fatty acidligand (Dhe-Paganon, J. Biol. Chem. 277(41), 37973-37976). U.S. PatentPublication No. 2002/0072587 describes methods of identifying agonistsof an estrogen receptor, a nuclear receptor like the HNF proteins, basedon its crystal structure. Such methods may easily be applied to HNF-1a,HNF-4a and HNF6 by one skilled in the art. Additional examples ofrational drug design based on the structure of a protein may be found inU.S. Pat. or Publication Nos. U.S. Pat. Nos. 6,236,946, 6,684,162,2004/0014153, 2003/0124699, 20030077628, 2002/0151028, 2002/0072587 and2003/0211588.

6. Therapeutics

In one aspect, the invention provides methods of treating disease in asubject comprising the administration of a composition comprising atherapeutic agent. “Therapeutic agent” or “therapeutic” refers to anagent capable of having a desired biological effect on a host.Chemotherapeutic and genotoxic agents are examples of therapeutic agentsthat are generally known to be chemical in origin, as opposed tobiological, or cause a therapeutic effect by a particular mechanism ofaction, respectively. Examples of therapeutic agents of biologicalorigin include growth factors, hormones, and cytokines. A variety oftherapeutic agents are known in the art and may be identified by theireffects. Certain therapeutic agents are capable of regulating cellproliferation and differentiation. Examples include chemotherapeuticnucleotides, drugs, hormones, non-specific (non-antibody) proteins,oligonucleotides (e.g., antisense oligonucleotides that bind to a targetnucleic acid sequence (e.g., mRNA sequence)), peptides, andpeptidomimetics.

In one embodiment, the compositions are pharmaceutical compositions.Pharmaceutical compositions for use in accordance with the presentinvention may be formulated in conventional manner using one or morephysiologically acceptable carriers or excipients. Thus, the compoundsand their physiologically acceptable salts and solvates may beformulated for administration by, for example, by aerosol, intravenous,oral or topical route. The administration may comprise intralesional,intraperitoneal, subcutaneous, intramuscular or intravenous injection;infusion; liposome-mediated delivery; topical, intrathecal, gingivalpocket, per rectum, intrabronchial, nasal, transmucosal, intestinal,oral, ocular or otic delivery.

An exemplary composition of the invention comprises an compound capableof modulating the expression or activity of a transcriptional regulatorwith a delivery system, such as a liposome system, and optionallyincluding an acceptable excipient. In a preferred embodiment, thecomposition is formulated for injection.

Techniques and formulations generally may be found in Remmington'sPharmaceutical Sciences, Meade Publishing Co., Easton, Pa. For systemicadministration, injection is preferred, including intramuscular,intravenous, intraperitoneal, and subcutaneous. For injection, thecompounds of the invention can be formulated in liquid solutions,preferably in physiologically compatible buffers such as Hank's solutionor Ringer's solution. In addition, the compounds may be formulated insolid form and redissolved or suspended immediately prior to use.Lyophilized forms are also included.

For oral administration, the pharmaceutical compositions may take theform of, for example, tablets or capsules prepared by conventional meanswith pharmaceutically acceptable excipients such as binding agents(e.g., pregelatinised maize starch, polyvinylpyrrolidone orhydroxypropyl methylcellulose); fillers (e.g., lactose, microcrystallinecellulose or calcium hydrogen phosphate); lubricants (e.g., magnesiumstearate, talc or silica); disintegrants (e.g., potato starch or sodiumstarch glycolate); or wetting agents (e.g., sodium lauryl sulphate). Thetablets may be coated by methods well known in the art. Liquidpreparations for oral administration may take the form of, for example,solutions, syrups or suspensions, or they may be presented as a dryproduct for constitution with water or other suitable vehicle beforeuse. Such liquid preparations may be prepared by conventional means withpharmaceutically acceptable additives such as suspending agents (e.g.,sorbitol syrup, cellulose derivatives or hydrogenated edible fats);emulsifying agents (e.g., lecithin or acacia); non-aqueous vehicles(e.g., ationd oil, oily esters, ethyl alcohol or fractionated vegetableoils); and preservatives (e.g., methyl or propyl-p-hydroxybenzoates orsorbic acid). The preparations may also contain buffer salts, flavoring,coloring and sweetening agents as appropriate.

Preparations for oral administration may be suitably formulated to givecontrolled release of the active compound. For buccal administration thecompositions may take the form of tablets or lozenges formulated inconventional manner. For administration by inhalation, the compounds foruse according to the present invention are conveniently delivered in theform of an aerosol spray presentation from pressurized packs or anebuliser, with the use of a suitable propellant, e.g.,dichlorodifluoromethane, trichlorofluoromethane,dichlorotetrafluoroethane, carbon dioxide or other suitable gas. In thecase of a pressurized aerosol the dosage unit may be determined byproviding a valve to deliver a metered amount. Capsules and cartridgesof e.g., gelatin for use in an inhaler or insufflator may be formulatedcontaining a powder mix of the compound and a suitable powder base suchas lactose or starch.

The compounds may be formulated for parenteral administration byinjection, e.g., by bolus injection or continuous infusion. Formulationsfor injection may be presented in unit dosage form, e.g., in ampoules orin multi-dose containers, with an added preservative. The compositionsmay take such forms as suspensions, solutions or emulsions in oily oraqueous vehicles, and may contain formulatory agents such as suspending,stabilizing and/or dispersing agents. Alternatively, the activeingredient may be in powder form for constitution with a suitablevehicle, e.g., sterile pyrogen-free water, before use.

The compounds may also be formulated in rectal compositions such assuppositories or retention enemas, e.g., containing conventionalsuppository bases such as cocoa butter or other glycerides.

In addition to the formulations described previously, the compounds mayalso be formulated as a depot preparation. Such long acting formulationsmay be administered by implantation (for example subcutaneously orintramuscularly) or by intramuscular injection. Thus, for example, thecompounds may be formulated with suitable polymeric or hydrophobicmaterials (for example as an emulsion in an acceptable oil) or ionexchange resins, or as sparingly soluble derivatives, for example, as asparingly soluble salt.

Systemic administration can also be by transmucosal or transdermalmeans. For transmucosal or transdermal administration, penetrantsappropriate to the barrier to be permeated are used in the formulation.Such penetrants are generally known in the art, and include, forexample, for transmucosal administration bile salts and fusidic acidderivatives. in addition, detergents may be used to facilitatepermeation. Transmucosal administration may be through nasal sprays orusing suppositories. For topical administration, the oligomers of theinvention are formulated into ointments, salves, gels, or creams asgenerally known in the art. A wash solution can be used locally to treatan injury or inflammation to accelerate healing.

The compositions may, if desired, be presented in a pack or dispenserdevice which may contain one or more unit dosage forms containing theactive ingredient. The pack may for example comprise metal or plasticfoil, such as a blister pack. The pack or dispenser device may beaccompanied by instructions for administration.

For therapies involving the administration of nucleic acids, theoligomers of the invention can be formulated for a variety of modes ofadministration, including systemic and topical or localizedadministration. Techniques and formulations generally may be found inRemmington's Pharmaceutical Sciences, Meade Publishing Co., Easton, Pa.For systemic administration, injection is preferred, includingintramuscular, intravenous, intraperitoneal, intranodal, andsubcutaneous for injection, the oligomers of the invention can beformulated in liquid solutions, preferably in physiologically compatiblebuffers such as Hank's solution or Ringer's solution. In addition, theoligomers may be formulated in solid form and redissolved or suspendedimmediately prior to use. Lyophilized forms are also included.

Systemic administration can also be by transmucosal or transdermalmeans, or the compounds can be administered orally. For transmucosal ortransdermal administration, penetrants appropriate to the barrier to bepermeated are used in the formulation. Such penetrants are generallyknown in the art, and include, for example, for transmucosaladministration bile salts and fusidic acid derivatives. In addition,detergents may be used to facilitate permeation. Transmucosaladministration may be through nasal sprays or using suppositories. Fororal administration, the oligomers are formulated into conventional oraladministration forms such as capsules, tablets, and tonics. For topicaladministration, oligomers may be formulated into ointments, salves,gels, or creams as generally known in the art.

Toxicity and therapeutic efficacy of the agents and compositions of thepresent invention can be determined by standard pharmaceuticalprocedures in cell cultures or experimental animals, e.g., fordetermining the LD₅₀ (the dose lethal to 50% of the population) and theED₅₀ (the dose therapeutically effective in 50% of the population). Thedose ratio between toxic and therapeutic effects is the therapeuticindex and it can be expressed as the ratio LD₅₀/ED₅₀. Compounds whichexhibit large therapeutic induces are preferred. While compounds thatexhibit toxic side effects may be used, care should be taken to design adelivery system that targets such compounds to the site of affectedtissue in order to minimize potential damage to uninfected cells and,thereby, reduce side effects.

The data obtained from the cell culture assays and animal studies can beused in formulating a range of dosage for use in humans. The dosage ofsuch compounds lies preferably within a range of circulatingconcentrations that include the ED₅₀ with little or no toxicity. Thedosage may vary within this range depending upon the dosage formemployed and the route of administration utilized. For any compound usedin the method of the invention, the therapeutically effective dose canbe estimated initially from cell culture assays. A dose may beformulated in animal models to achieve a circulating plasmaconcentration range that includes the IC₅₀ (i.e., the concentration ofthe test compound which achieves a half-maximal inhibition of symptoms)as determined in cell culture. Such information can be used to moreaccurately determine useful doses in humans. Levels in plasma may bemeasured, for example, by high performance liquid chromatography.

In one embodiment of the methods described herein, the effective amountof the agent is between about 1 mg and about 50 mg per kg body weight ofthe subject. In one embodiment, the effective amount of the agent isbetween about 2 mg and about 40 mg per kg body weight of the subject. Inone embodiment, the effective amount of the agent is between about 3 mgand about 30 mg per kg body weight of the subject. In one embodiment,the effective amount of the agent is between about 4 mg and about 20 mgper kg body weight of the subject. In one embodiment, the effectiveamount of the agent is between about 5 mg and about 10 mg per kg bodyweight of the subject.

In one embodiment of the methods described herein, the agent isadministered at least once per day. In one embodiment, the agent isadministered daily. In one embodiment, the agent is administered everyother day. In one embodiment, the agent is administered every 6 to 8days. In one embodiment, the agent is administered weekly.

As for the amount of the compound and/or agent for administration to thesubject, one skilled in the art would know how to determine theappropriate amount. As used herein, a dose or amount would be one insufficient quantities to either inhibit the disorder, treat thedisorder, treat the subject or prevent the subject from becomingafflicted with the disorder. This amount may be considered an effectiveamount. A person of ordinary skill in the art can perform simpletitration experiments to determine what amount is required to treat thesubject. The dose of the composition of the invention will varydepending on the subject and upon the particular route of administrationused. In one embodiment, the dosage can range from about 0.1 to about100,000 ug/kg body weight of the subject. Based upon the composition,the dose can be delivered continuously, such as by continuous pump, orat periodic intervals. For example, on one or more separate occasions.Desired time intervals of multiple doses of a particular composition canbe determined without undue experimentation by one skilled in the art.

The effective amount may be based upon, among other things, the size ofthe compound, the biodegradability of the compound, the bioactivity ofthe compound and the bioavailability of the compound. If the compounddoes not degrade quickly, is bioavailable and highly active, a smalleramount will be required to be effective. The effective amount will beknown to one of skill in the art; it will also be dependent upon theform of the compound, the size of the compound and the bioactivity ofthe compound. One of skill in the art could routinely perform empiricalactivity tests for a compound to determine the bioactivity in bioassaysand thus determine the effective amount. In one embodiment of the abovemethods, the effective amount of the compound comprises from about 1.0ng/kg to about 100 mg/kg body weight of the subject. In anotherembodiment of the above methods, the effective amount of the compoundcomprises from about 100 ng/kg to about 50 mg/kg body weight of thesubject. In another embodiment of the above methods, the effectiveamount of the compound comprises from about 1 ug/kg to about 10 mg/kgbody weight of the subject. In another embodiment of the above methods,the effective amount of the compound comprises from about 100 ug/kg toabout 1 mg/kg body weight of the subject.

As for when the compound, compositions and/or agent is to beadministered, one skilled in the art can determine when to administersuch compound and/or agent. The administration may be constant for acertain period of time or periodic and at specific intervals. Thecompound may be delivered hourly, daily, weekly, monthly, yearly (e.g.in a time release form) or as a one time delivery. The delivery may becontinuous delivery for a period of time, e.g. intravenous delivery. Inone embodiment of the methods described herein, the agent isadministered at least once per day. In one embodiment of the methodsdescribed herein, the agent is administered daily. In one embodiment ofthe methods described herein, the agent is administered every other day.In one embodiment of the methods described herein, the agent isadministered every 6 to 8 days. In one embodiment of the methodsdescribed herein, the agent is administered weekly.

Exemplification

The invention now being generally described, it will be more readilyunderstood by reference to the following examples, which are includedmerely for purposes of illustration of certain aspects and embodimentsof the present invention, and are not intended to limit the invention,as one skilled in the art would recognize from the teachings hereinaboveand the following examples, that other DNA microarrays, transcriptionalregulators, cell types, antibodies, CHIP conditions, or data analysismethods, all without limitation, can be employed, without departing fromthe scope of the invention as claimed.

The practice of the present invention will employ, where appropriate andunless otherwise indicated, conventional techniques of cell biology,cell culture, molecular biology, transgenic biology, microbiology,virology, recombinant DNA, and immunology, which are within the skill ofthe art. Such techniques are described in the literature. See, forexample, Molecular Cloning: A Laboratory Manual, 3rd Ed., ed. bySambrook and Russell (Cold Spring Harbor Laboratory Press: 2001); thetreatise, Methods In Enzymology (Academic Press, Inc., N.Y.); UsingAntibodies, Second Edition by Harlow and Lane, Cold Spring Harbor Press,New York, 1999; Current Protocols in Cell Biology, ed. by Bonifacino,Dasso, Lippincott-Schwartz, Harford, and Yamada, John Wiley and Sons,Inc., New York, 1999; and PCR Protocols, ed. by Bartlett et al., HumanaPress, 2003.

Various publications, patents, and patent publications are citedthroughout this application the contents of which are incorporatedherein by reference in their entirety.

Experimental Procedures

The following procedures were followed in performing the experimentsbelow:

Genome-Scale Location Analysis

The protocol described here was adapted from Ren 2001. Briefly, cellsare fixed with 1% final concentration formaldehyde for 10-20 minutes atroom temperature, harvested and rinsed with 1×PBS. The resultant cellpellet is sonicated, and DNA fragments that are crosslinked to a proteinof interest are enriched by immunoprecipitation with a factor specificantibody. After reversal of the crosslinking, the enriched DNA isamplified using ligation-mediated PCR (LM-PCR), and then fluorescentlylabeled using high concentration Klenow polymerase and adNTP-fluorophore. A sample of DNA that has not been enriched byimmunoprecipitation is subjected to LM-PCR and labeled with a differentfluorophore. Both IP-enriched and unenriched pools of labeled DNA arehybridized to a single DNA microarray containing 13,000 human intergenicregions (see below for description of DNA microarray and binding sitedetermination). For hepatocyte experiments, 2.5×107 hepatocytes weretypically used per chromatin immunoprecipitation. These hepatocytes wereisolated by standard liver perfusion techniques, immediately crosslinkedwith 1% formaldehyde solution, rinsed, and flash frozen. Isletpreparations were treated with formaldehyde between 1 hour and 5 daysafter isolation from pancreata. A minimum of 30,000 viable isletequivalents (approximately 2×10⁷ beta cells) were fixed and handled asdescribed above. Typical islet purity for three experiments describedhere was >70% islets with >80% viability. HNF4a, HNF6, and RNApolymerase II produced high quality results with as few as 30,000 isletequivalents. HNF1a ChIP required significantly more material, typically80,000 islets, to produce results with somewhat lower enrichment ratiosthan the results obtained with hepatocytes.

Human 13K DNA Microarray

It would be ideal to have a DNA microarray that contains the entirehuman genome sequence, but technical limitations and cost led applicantsto select the most relevant portion of the genome for inclusion in thismicroarray. Because a significant percentage of transcriptional bindingsites in proximal promoters are within 1 kb of transcription startsites, applicants designed primers to amplify these genomic regions forprinting onto a promoter array. Applicants selected 15000 cDNAs from theNCBI RefSeq database, and mapped them to NCBI Build 22 (April 2001) ofthe human genome using BLAST. Where multiple splice variants had beendescribed, applicants used the most upstream site, and verified the5′-end by alignment with the Database of Transcriptional Start Sites(http://elmo.ims.utokyo.ac jp/dbtss/). Sequences to be amplified wereextracted from the genomic region −750 bp to +250 bp relative to thistranscriptional start site. To control for nonspecific binding, 9amplified regions derived from long Arabidopsis open reading frames wereincluded on the array. As a further negative control and for use in datanormalization, applicants chose 158 ORF regions within long exons ofhuman genes for amplification. To prepare the DNA content of the arrays,the program Primer3(http://wwwgenome.wi.mit.edu/genome_software/other/primer3.html) wasused to design primers using the sequences described above. PCRs wereperformed on these primer set using standard conditions, except for thepresence of 1 M betaine in all PCR reactions. Betaine was empiricallyobserved to increase the success rate of the amplification reactions.

Of the 13,000 PCR pairs, 70% gave a strong band of the appropriate size,as verified on 2% agarose gels. Applicants have noted, however, that PCRproducts undetectable by agarose EtBr gel analysis can give validpositive signals when concentrated and printed on the DNA arrays. PCRquality evaluations were performed on the BRIDNAsuite of programs fromthe Biotechnology Research Institute of the National Research Council ofCanada (http://www.irb-bri.cnrc-nrc.gc.ca/). PCR products were recoveredfrom the reaction mixture by ammonium acetate/isopropanol precipitationand resuspended into 3×SSC with 1.5 M betaine to minimize evaporationand improve spot quality. Applicants printed amplified products ontoGAPS-coated glass slides (Coming) using a Cartesian PixSys 5500 arrayer.The quality of the arrays was determined on a batch-wise basis byhybridization with sequence neutral oligonucleotides covalently linkedto Cy3 or Cy5, followed by calculation of usable percentage of spots,combined with direct visual inspection of the quality of the chip. TheHu13K array was remapped post-production using two independent methods.First, applicants performed electronic PCR on the primer sets againstthe August 2003 final release of the completed human genome. Second,applicants BLASTed the sequence used to extract primers foramplification against the August 2003 final release of the human genome.The dataset downloadable from the supporting website reports thelocation of each arrayed promoter relative to the transcriptional startsite.

Data Quality Control

1. ChIP Hybridization Quality Control

The raw data generated from each array experiment was subjected tomultiple levels of quality control. First, each scan was examinedvisually as it was being performed. Samples on microarrays with grossdefects (e.g. scratches, smeared spots) were repeated whenever possible.Applicants also determined that no reliable signal was produced fromcontrol spots containing Arabidopsis DNA.

2. Binding Site Determination and Error Model

Scanned images were analyzed using GenePix (v3.1 or v4.0), to obtainbackground subtracted intensity values. Each spot is bound by bothIP-enriched and unenriched DNA, which are labeled with differentfluorophores. Consequently, each spot yields fluorescence intensityinformation in two channels, corresponding to immunoprecipitated DNA andgenomic DNA. To account for background hybridization to slides, themedian intensity of a set of control blank spots was subtracted forsite-specific transcription factors (e.g. HNF1a), and the medianintensity for a set of control ORF spots was subtracted for broadlyacting DNA binding proteins (e.g. RNA Pol II, HNF4a). To correct fordifferent amounts of genomic and immunoprecipitated DNA hybridized tothe microarray, the median intensity value of the IP-enriched DNAchannel was divided by the median of the genomic DNA channel, and thisnormalization factor was applied to each intensity in the genomic DNAchannel. Next, applicants calculated the log of the ratio of intensityin the IP-enriched channel to intensity in the genomic DNA channel foreach intergenic region across the entire set of hybridizationexperiments. Adjusted intensity values for the IP-enriched channel werecalculated from these ratios. A whole-chip error model (Hughes 2000; Lee2002) was then used to calculate confidence values for each spot on eachmicroarray, and to combine data for the replicates of each experiment toobtain a final average ratio and confidence for each promoter region.Genes were included in the set of ‘bound’ genes if the binding P-valuein the error model was <0.001 or enrichment was at least 2-fold in theimmunoprecipitation.

Confirmation of Predicted Binding

The accuracy of genome-wide location data reported here has beenassessed using several approaches.

1. Estimation of False Positive Rates Using Conventional ChIPExperiments

Conventional, independent ChIP experiments conducted in our laboratoryat a gene specific level have confirmed over 100 binding interactionsidentified by location analysis data involving 6 different regulators(see http://web.wi.mit.edu/young/pancregulators). These results suggestthat our empirical rate of false positives is at most 16%. This rate issomewhat higher than that found for a large scale survey of yeasttranscription factors (Lee 2002), which probably reflects the greatercomplexity of the human genome. FIGS. 9 and 10 show typical verificationChIP experiments for HNF4a and HNF1a, respectively, in hepatocytes.

2. Comparison with Previous Literature

Applicants found no previous studies of the genomic targets oftranscriptional regulators in primary human tissue. However, a largenumber of HNF1a and HNF4a targets have been identified in modelorganisms and human carcinoma (mostly hepatoma) cell lines; thesetargets are summarized in FIG. 14. For example, genome-scale locationanalysis identified 30 of the 68 hepatocyte genes which were bothpreviously suggested to be targets of HNF4a, and included on the 13K DNAarray. Similarly, genome-scale location analysis identified 21 of the 81hepatocyte genes which were both previously suggested to be targets ofHNF4a, and included on the 13K DNA array. Discrepancies between thetargets reported here and targets reported in the literature may resultfrom a number of factors, which include, but are not limited to: (1) thelimitations of using a 1 kb promoter fragment to probe the binding of atranscription factor, (2) the stringency of our threshold criteria, (3)the differences between the regulatory network in model organisms and/orcell lines, and the regulatory network in primary human tissue, (4)differences between indirect technologies in the literature (i.e.gel-shift and transient transfections) and genome-scale locationanalysis, (5) tissue isolation effects, among others. A morecomprehensive discussion can be found athttp://web.wi.mit.edu/young/pancregulators

Regulatory Motifs Derived from Binding Data

In order to discover network motifs, two data matrices were created. Theoverall matrix D consists of binary entries Dij, where a 1 indicatesbinding of regulator j to intergenic region i, a 0 indicates no bindingevent. The regulator matrix R is a subset of D, containing only the rowscorresponding to the intergenic region assigned to each regulator, inthe same order as the columns of regulators. All analyses were performedin Matlab. The algorithms used to find each motif are described below.Autoregulatory motif: Find each non-zero entry on the diagonal of R.Feedforward loop: For each master regulator (column of R), find non-zeroentries, which correspond to regulators bound. For each masterregulator/secondary regulator pair, find all rows in D bound by bothregulators. Multi-component loop: For each regulator (column of R), findthe regulators to which it binds. For each of these, find the regulatorsit binds. If any of these are the original regulator, you have amulti-component loop of two. For all others, find regulators to whichthey bind. If any of these are the original, you have a multicomponentloop of three. Repeat to find larger loops. Single input module: Findthe intergenic regions bound by only one regulator. That is, take thesubset of rows of D such that the sum of each row is 1. Then for eachregulator (column), find non-zero entries. Each set (greater than threeintergenic regions) is a SIM. Multi-input module: Find the intergenicregions bound by more than one regulator. That is, take the subset ofrows of D such that the sum of each row is greater than 1. Then, foreach row, find any other row bound by the same regulators. Thecollection of rows bound by the same regulators correspond to a MIM.Once a row is assigned to a MIM, remove it from further analysis.Regulator chain: For each regulator (column of R), use a recursivealgorithm to find chains of all lengths. That is, for each regulatorwhose promoter is bound by the regulator before it in the chain, findthe regulator promoters to which it binds. Repeat until the chain ends.There are three possible ways to end a chain: a regulator that does notbind to the promoter of any other regulator, a regulator that binds toits own promoter, or one that binds to the promoter of another regulatorearlier in the chain.

EXAMPLE 1

The liver and pancreas have long been the subject of studies tounderstand how organs develop and are regulated at the transcriptionallevel (8-12). The transcriptional regulators HNF1α (a homeodomainprotein), HNF4α (a nuclear receptor) and HNF6 (a member of the onecutfamily) operate cooperatively in a connected network in the liver, butless in known about the structure of this regulatory network in humanpancreatic islets. All three transcriptional regulators are required fornormal function of liver and pancreatic islets (13-18). Mutations inHNF1α and HNF4α are the causes of the type 3 and type 1 forms ofmaturity-onset diabetes of the young (MODY3 and MODY1), a geneticdisorder of the insulin-secreting pancreatic beta cells characterized byonset of diabetes mellitus before 25 years of age and an autosomaldominant pattern of inheritance (19).

Applicants hypothesized that genome-scale analysis of the pancreaticislet genes whose expression is regulated by these transcription factorsin normal beta cells could provide insights into the molecular basis ofthe abnormal beta cell function that characterizes MODY. Applicants haveidentified the genes occupied by the transcription factors HNF1α, HNF4α,and HNF6 in pancreatic islets. The genes transcribed in each tissue wereidentified by determining the genomic occupancy of RNA polymerase II.Applicants used this information to begin to map the transcriptionalregulatory circuitry in these tissues.

Applicants first used genome-scale location analysis (20) to identifythe promoters bound by HNF1α in human hepatocytes and pancreatic isletsisolated from tissue donors (FIG. 1A). For each tissue, HNF1α-DNAcomplexes were enriched by chromatin immunoprecipitation in threeseparate experiments. Applicants constructed a custom DNA microarraycontaining portions of promoter regions of 13,000 human genes (Hu13Karray). Applicants targeted the region spanning 700 bp upstream and 200bp downstream of transcription start sites for the genes whose startsites are best characterized based on National Center for BiotechnologyInformation annotation (20). Although many enhancers are present at moredistant locations, most known transcription factor binding sitesequences occur within these start-site proximal regions of promoters.

The results of these genome location experiments revealed that HNF1α isbound to at least 222 target genes in hepatocytes, representing 1.6% ofthe genes on the Hu13K array (FIG. 11) (20). This result was verifiedwith independent, conventional chromatin immunoprecipitationexperiments, which suggest that the frequency of false positives ingenome-scale location data with gene-specific regulators is no more than16% when our threshold criteria were used (20). The genes applicantsfound to be occupied by HNF1α in primary human hepatocytes encodeproducts whose functions represent a significant cross-section ofhepatocyte biochemistry. The results confirm that HNF1α contributes tothe transcriptional regulation of many of the central rate-limitingsteps in gluconeogenesis and associated pathways. HNF1α also binds togenes whose products are central to normal hepatic function, includingcarbohydrate synthesis and storage, lipid metabolism (synthesis ofcholesterol and apolipoproteins), detoxification (synthesis ofcytochrome P450s) and synthesis of serum proteins (albumin, complementsand coagulation factors).

Applicants next identified HNF1α target genes in human pancreatic islets(FIG. 11) (20). HNF1α occupied the promoter regions of 106 genes (0.8%of the Hu13K array promoters) in islets, 30% of which were also bound byHNF1α in hepatocytes (FIG. 1B). In islets, fewer chaperones and enzymesare bound by HNF1α than in hepatocytes, and the receptors and signaltransduction machinery regulated by HNF1α vary between the two tissues.

HNF1α has been previously implicated in the regulation of many genes inhepatocytes and islets (13, 16, 20 [FIG. 15]). The direct genome bindingdata reported here confirmed many, but not all, of these genes. Thedifference may be due, at least in part, to our stringent criteria forbinding in the genome-scale data, which enhances our confidence in thedirect target genes identified by location analysis, but likelyunderestimates the actual number of targets in vivo. Furthermore,although the proximal promoter regions printed on the array contain asignificant number of transcription factor binding sequences, many genesare also regulated by more distal promoter elements and enhancers thatare not present on the Hu13K array.

Applicants also identified the promoters bound by HNF6 in humanhepatocytes and pancreatic islets using genome-scale location analysis(FIG. 1B; FIGS. 16 and 17) (20). HNF6 was bound to at least 222 genes inhepatocytes and 189 genes in pancreatic islets, representing 1.7% and1.4% of the promoters on the array, respectively. Approximately half ofthe promoters occupied by HNF6 were common to the two tissues, andincluded a number of important cell cycle regulators such as CDK2 (20).

Genome-scale location analysis revealed surprising results for HNF4α inhepatocytes and pancreatic islets (FIG. 1B). The number of genesenriched in HNF4α chromatin immunoprecipitations was much larger thanobserved with typical site-specific regulators. HNF4α was bound toapproximately 12% of the genes represented on the Hu13K DNA microarrayin hepatocytes and 11% in pancreatic islets. No other transcriptionfactor applicants have profiled in human cells has been observed to bindmore than 2.5% of the promoter regions represented on the 13K array.

Six independent lines of evidence indicate that the HNF4α results arenot due to poor antibody specificity or errors in the microarrayanalysis, and support the view that HNF4α is associated with anunusually large number of promoters in hepatocytes and pancreatic islets(20). First, essentially identical results were obtained with twodifferent antibodies that recognize different portions of HNF4α. Second,Western blots showed that the HNF4α antibodies are highly specific.Third, applicants verified binding at over 50 randomly selected targetsof HNF4α in hepatocytes by conventional gene-specific chromatinimmunoprecipitation. Fourth, when antibodies against HNF4α were used forChIP in control experiments with Jurkat, U937, and BJT cells (which donot express HNF4α, no more than 17 promoters were identified in eachcell line by our criteria, which is well within the noise inherent inthis system. Fifth, when pre-immune antibodies from rabbit and goat (thetwo different anti-HNF4α antibodies came from rabbit and goat) were usedin control experiments in hepatocytes, the number of targets identifiedwas within the noise. Finally, if the HNF4α results are correct, thenapplicants would expect that the set of promoters bound by HNF4α shouldbe largely a subset of those bound by RNA polymerase II in each tissue;applicants found that this is the case (see below). Applicants concludethat HNF4α is a widely acting transcription factor in these tissues,consistent with the observation that it is an unusually abundant,constitutively active transcription factor (11).

Applicants next identified the genes represented on the Hu13K microarraythat are actively transcribed in hepatocytes and pancreatic islets, sothe fraction of actively transcribed genes that are bound by HNF4α couldbe determined (FIG. 2C). It is difficult to determine accurately thetranscriptome of these tissues by profiling transcript levels with DNAmicroarrays. Transcript profiling requires a reference RNA populationagainst which a tissue RNA population can be compared, and there arelimitations to generating appropriate reference RNA. To circumvent thislimitation, applicants exploited the fact that RNA polymerase IIoccupies the set of protein-coding genes that are actively transcribedin eukaryotic cells. Location analysis with RNA polymerase II antibodiescan identify these actively transcribed genes (7, 21). Applicants foundthat 23% of the genes on the Hu13K array (2984 genes) were bound by RNApolymerase II in hepatocytes, and 19% (2426 genes) were bound by RNApolymerase II in islets (20). The sets of genes occupied by RNApolymerase II in hepatocytes and islets overlapped substantially (81%overlap, relative to islets), consistent with the relatedness of the twotissues (22). As expected, the majority of genes occupied by HNF4α inhepatocytes and pancreatic islets (80% and 73%, respectively) were alsooccupied by RNA polymerase II. Remarkably, of the genes occupied by RNApolymerase II, 42% (1262/2984) were bound by HNF4α in hepatocytes and43% (1047/2426) were bound by HNF4α in islets (FIG. 1C). By comparison,only 6% and 2% of RNA polymerase II enriched promoters were also boundby HNF1α in hepatocytes and islets, respectively.

Previous studies indicate that HNF1α, HNF4α, and HNF6 are at the centerof a network of transcription factors that cooperatively regulatenumerous developmental and metabolic functions in hepatocytes and islets(9, 13, 15, 17). Our systematic analysis of the direct in vivo targetsof these factors significantly expands our understanding of theregulatory network in primary human tissues (FIG. 2A). A comparison ofthe regulatory network in these two tissues reveals that HNF1α, HNF4α,and HNF6 occupy the promoters of genes encoding a large population oftranscription factors and cofactors in the two tissues (20). The preciseset of transcription factor genes occupied by HNF1α, HNF4α, and HNF6,and the extent to which they are co-occupied by the HNF regulators,differed substantially between these two tissues.

The transcription factor binding data was used to identify regulatorynetwork motifs, simple units of transcriptional regulatory networkarchitecture that suggest mechanistic models (FIG. 2B) (4, 23). Our dataconfirm previous reports that HNF1α and HNF4α occupy one another'spromoters in both hepatocytes and islets, forming a multi-component loop(24-26). Multicomponent loops provide the capacity for feedback controland produce bistable systems that can switch between two alternatestates (23). It has been suggested that the multicomponent loop presentbetween HNF1α and HNF4α is responsible for stabilization of the terminalphenotype in pancreatic beta cells (26). Applicants also found that HNF6serves as a master regulator for feedforward motifs in hepatocytes andpancreatic islets involving over 80 genes in each tissue (FIGS. 20 and22). For example, in hepatocytes, HNF6 binds the HNF4α7 promoter, andHNF6 and HNF4α together bind PCK1, which encodes phosphoenolpyruvatecarboxykinase, an enzyme key to gluconeogenesis (FIG. 2B). A feedforwardloop can act as a switch designed to be sensitive to sustained, ratherthan transient, inputs (23). HNF1α, HNF4α and HNF6 were also found toform multi-input motifs by collectively binding to sets of genes inhepatocytes and islets. This regulatory motif suggests coordination ofgene expression through multiple input signals. Applicants also foundthat HNF6, HNF4α, and HNF1α form a regulator chain motif with THRA(NR1D1); regulator chain motifs represent the simplest circuit logic forordering transcriptional events in a temporal sequence (4, 23).Additional examples of these regulatory motifs can be found in FIGS. 20and 23 (20). FIGS. 20-24, panels A and B, show transcriptionalregulators occupied by HNF transcription factors and their regulatoryloops. FIGS. 4-10 show additional controls and data generated by theexperiments described herein.

Our results suggest that the nuclear hormone receptor HNF4α contributesto regulation of a large fraction of the liver and pancreatic islettranscriptomes by binding directly to almost half of the activelytranscribed genes. This likely explains why HNF4α is crucial fordevelopment and proper function of these tissues (12-15, 17, 18).Perhaps most importantly, our results suggest a mechanistic explanationfor the recent discovery that polymorphisms in the islet-specific P2promoter for the splice variant HNF4α7 can greatly increase the risk oftype II diabetes (27-30). Applicants found that multiple HNF factorsbind directly to the P2 promoter in primary, healthy human islets.Alterations in the binding sites for these factors could causemisregulation of HNF4α expression and thus its downstream targets,leading to beta cell malfunction and diabetes.

REFERENCES FOR EXPERIMENTAL SECTION

-   1. Roeder, R. G. Cold Spring Harb Symp Quant Biol 63, 201(1998).-   2. T. I. Lee, R. A. Young. Annu Rev Genet 34, 77 (2000).-   3. G. Orphanides, D. Reinberg. Cell 108, 439 (2002).-   4. T. I. Lee, et al. Science 298, 799 (2002).-   5. B. Ren, et al. Genes Dev 16, 245 (2002).-   6. A. S. Weinmann, et al. Genes Dev 16, 235 (2002).-   7. Z. Li, et al. Proc Natl Acad Sci USA 100, 8164 (2003).-   8. E. Lai, J. E. Darnell, Jr. Trends Biochem Sci 16, 427 (1991).-   9. C. J. Kuo, et al. Nature 355, 457 (1992).-   10. M. Pontoglio, et al. Cell 84, 575 (1996).-   11. F. M. Sladek, Seidel, S. D. in. Nuclear Receptors and Genetic    Disease. T. P. Burris, Ed. (Academic Press, New York, 2001).-   12. F. Parviz, et al. Nat Genet 34, 292 (2003).-   13. R. H. Costa, et al, Hepatology 38, 1331 (2003).-   14. D. Q. Shih, et al. Diabetes 50, 2472 (2001).-   15. D. Q. Shih, M. Stoffel. Proc Natl Acad Sci USA 98, 14189 (2001).-   16. K. S. Zaret. Nat Rev Genet 3, 499 (2002).-   17. P. Jacquemin et al. Dev Biol 258, 105 (2003).-   18. Fajans, S. S., et al. N Engl J Med 345, 971 (2001).-   19. See supporting data on Science Online, and additional    information is available at the authors' website:    http://web.wi.mit.edu/young/pancregulators-   20. H. H. Ng, F. Robert, R. A. Young, K. Struhl. Genes Dev 16, 806    (2002).-   21. R. Bort, K. Zaret. Nat Genet 32, 85 (2002).-   22. R. Milo, et al. Science 298, 824 (2002).-   23. S. F. Boj ae tl. Proc Natl Acad Sci USA 98, 14481 (2001).-   24. H. Thomas, et al. Hum Mol Genet 10, 2089 (2001).-   25. J. Ferrer. Diabetes 51, 2355 (2002).-   26. I. Barroso et al. PLoS Biology 1, 41 (2003).-   27. Q. Zhu et al. Diabetologia 46, 567 (2003).-   28. L. Love-Gregory et al. Diabetes 54 (2004) in press.-   29. K. Silander et al. Diabetes 54 (2004) in press.

1. A method of determining which genes from a subset of genes areregulated by a transcriptional regulator expressed in a cell, the methodcomprising (a) selectively isolating chromatin from the cell to generateisolated chromatin; (b) selectively isolating chromatin fragments fromthe isolated chromatin to generate bound chromatin fragments, whereinthe bound chromatin fragments are bound by the transcriptionalregulator; (c) amplifying both the bound chromatin fragments to generateamplified chromatin fragments and the isolated chromatin to generateamplified control chromatin; (d) hybridizing the amplified controlchromatin and the amplified chromatin fragments to a DNA microarray,wherein the DNA microarray comprises (1) at least 10,000 experimentalspots, each experimental spot comprising an experimental DNA, eachexperimental DNA comprising a promoter region from a gene in the subset;and (2) at least 100 control spots, each control spot comprising acontrol DNA, each control DNA comprising a non-promoter region; and (e)determining and comparing a hybridization signal at each of the spots onthe microarray between those generated by (1) the amplified controlchromatin; and (2) the amplified chromatin fragments; wherein a gene inthe subset is said to be regulated by the transcriptional regulator inthe cell if a spot comprising a promoter region of said gene displays ahigher level of hybridization by the amplified chromatin fragments thanby the amplified control chromatin.
 2. The method of claim 1, whereinthe level of hybridization of the amplified chromatin fragments to eachexperimental spot is normalized by the level of hybridization of theamplified chromatin fragments to the control spots.
 3. The method ofclaim 1, wherein the level of hybridization of the amplified chromatinfragments to each experimental spot is normalized by subtracting themean level of hybridization of the amplified chromatin fragments to thecontrol spots.
 4. The method of claim 1, wherein the higher level ofhybridization comprises at least a two-fold higher level ofhybridization. 5-11. (canceled)
 12. The method of claim 1, wherein thepromoter region of the gene comprises from at least 700 bp upstream toat least 200 bp downstream of the transcriptional start site of thegene.
 13. The method of claim 1, wherein the promoter region comprisesat least 30, 40, 50, or 60 or nucleotides in length.
 14. The method ofclaim 1, wherein the promoter region of the gene comprises a sequence ofat least 30 nucleotides whose sequence is identical to a regionstretching from 3 kb upstream to 1 kb downstream of the transcriptionalstart site of said gene. 15-17. (canceled)
 18. A method of identifying atranscriptional regulatory network in a cell, the method comprisingdetermining if a transcriptional regulator regulates additionaltranscriptional regulators in the cell using the method of claim 1,wherein a transcriptional regulatory network is identified if at leastone additional transcriptional regulator is determined to be regulatedby the transcriptional regulator.
 19. The method of claim 18, whereinthe experimental DNA comprises promoter regions from the additionaltranscriptional regulators.
 20. A method of identifying atranscriptional regulatory network in a cell, the method comprisingdetermining if a transcriptional regulator regulates (i) its ownpromoter; or (ii) a promoter from a plurality of transcriptionalregulators, using the method of claim 1, wherein the experimental DNAcomprises (a) a promoter from the transcriptional regulator; and (b)promoters from the plurality of transcriptional regulators; wherein atranscriptional regulatory network is identified if the transcriptionalregulator regulates itself or if it regulates at least one of theplurality of transcriptional regulators.
 21. A method of identifyingtranscriptional regulatory networks in a cell, the method comprising (a)determining, by repeating the method of claim 1 for each of a pluralityof transcriptional regulators, the genes in a subset which are regulatedby each of the plurality of transcriptional regulators, wherein theexperimental DNA comprises promoter regions for each of the plurality oftranscriptional regulators; (b) determining if any one of the pluralityof transcriptional regulators are regulated by at least one of theplurality of transcriptional regulators; wherein a transcriptionalregulatory network is identified if any one of the plurality oftranscriptional regulators is regulated by at least one of the pluralityof transcriptional regulators.
 22. The method of claim 21, furthercomprising determining if a gene is regulated by more than one of theplurality of transcriptional regulators.
 23. A DNA microarray fordetermining promoter occupancy in a human cell, the microarraycomprising (1) at least 10,000 experimental spots, each experimentalspot comprising an experimental DNA, each experimental DNA comprising apromoter region from a human gene in the subset; and (2) at least 100control spots, each control spot comprising a control DNA, each controlDNA comprising a non-promoter region; wherein at least 75% of thepromoter regions comprise from at least 700 bp upstream to at least 200bp downstream of the transcriptional start site.
 24. A method ofestimating if a transcriptional regulator is a global transcriptionalregulator, the method comprising (a) selectively isolating chromatinfrom a tissue; (b) identifying promoter regions from the chromatin whichare bound by a candidate global transcriptional regulator; (c)identifying promoter regions from the chromatin which are bound by amember of the basal transcriptional machinery; and (d) comparing thepromoter regions identified in steps (b) and (c) to determine the ratiobetween (i) the number of promoter regions bound by both the candidateglobal transcriptional regulator and the member of the basaltranscriptional machinery; and (ii) the number of promoter regions boundby the member of the basal transcriptional machinery wherein atranscriptional regulator is a global transcriptional regulator when theratio is greater than 0.2.
 25. The method of claim 24, wherein steps (b)and (c) are performed using a DNA microarray.
 26. The method of claim25, wherein the DNA microarray comprises (i) at least 10,000experimental spots, each experimental spot comprising an experimentalDNA, each experimental DNA comprising a promoter region from a humangene in the subset; and (ii) at least 100 control spots, each controlspot comprising a control DNA, each control DNA comprising anon-promoter region; 27-31. (canceled)
 32. A method of identifying atleast one target gene for the development of a therapeutic to treat orprevent a disorder in a subject, wherein at least one form of thedisorder is caused by an altered activity in a transcriptional regulatoror in a suspected transcriptional regulator, the method comprising (a)identifying the genes regulated by the transcriptional regulator in acell; (b) determining if the transcriptional regulator is a broad-actingtranscriptional regulator or a narrow-acting transcriptional regulator,wherein if the transcriptional regulator is a broad actingtranscriptional regulator then the transcriptional regulator is a targetgene for the development of a therapeutic, and wherein if thetranscriptional regulator is a narrow acting transcriptional regulatorthen (i) determining if at least one gene regulated by thetranscriptional regulator is likely causative in the disorder, wherein agene that is likely causative in the disorder is a target gene for thedevelopment of a therapeutic; and (ii) reiterating steps (a) and (b) forat least one gene that is regulated by the transcriptional regulator inthe cell and that either (1) encodes a transcriptional regulator or (2)is suspected to encode a transcriptional regulator, with themodification that the transcriptional regulator of steps (a) and (b) issaid gene, thereby identifying at least one target gene for thedevelopment of a therapeutic to treat or prevent a disorder in thesubject.
 33. The method of claim 32, wherein identifying the genesregulated by the transcriptional regulator in a cell compriseschromosome-wide location analysis.
 34. The method of claim 32, whereinidentifying the genes regulated by the transcriptional regulator in thecell comprises using the method of claim
 1. 35-38. (canceled)
 39. Themethod of the claim 32, wherein the broad acting gene regulates at leastabout 2.5% of the genes in the cell, and wherein the narrow acting generegulates less than about 2.5% of the genes in the cell.
 40. The methodof claim 32, wherein the gene is suspected to encode a transcriptionalregulator if it shares at least 30% amino acid sequence identity withthe DNA binding domain of a transcriptional regulator.
 41. The method ofclaim 32, wherein the transcriptional regulator in the cell is a mutanttranscriptional regulator.
 42. The method of claim 32, wherein thetranscriptional regulator in the cell has altered activity.
 43. Themethod of claim 32, wherein the gene regulated by the transcriptionalregulator is likely causative of the disorder when a mutation in thegene results in at least one phenotype or symptom associated with thedisorder.
 44. The method of claim 32, wherein the gene regulated by thetranscriptional regulator is likely causative of the disorder when thegene encodes an enzyme or signaling molecule which functions in apathway that is impaired in the disorder.
 45. The method of claim 32,wherein the altered activity in the transcriptional regulator comprisesat least one of the following: (a) an alteration in the binding affinityof the transcriptional regulator to DNA; (b) an alteration in theability of the transcriptional regulator to bind to RNA polymerase, toan RNA polymerase holoenzyme, or to a second transcriptional regulator;(c) an alteration in the binding affinity of the transcriptionalregulator to a ligand; (d) an alteration in expression level orexpression pattern of the transcriptional regulator; or (e) analteration in an ability of the transcriptional regulator to formhomomultimers or heteromultimers. 46-53. (canceled)
 54. A method oftreating or preventing type II diabetes in a subject, comprisingadministering to the subject a therapeutically effective amount of anagent that increases the global transcriptional activity of HNF4alpha.55. A method of treating or preventing a disorder associated with lowtranscriptional activity of HNF4alpha in a subject, comprisingadministering to the subject a therapeutically effective amount of anagent that increases the global transcriptional activity of HNF4alpha.56-64. (canceled)
 65. A method of identifying transcriptionally activegenes that are regulated by a transcriptional regulator in a cell, themethod comprising (a) selectively isolating chromatin from a tissue; (b)identifying promoter regions from the chromatin that are bound by thetranscriptional regulator; (c) identifying promoter regions from thechromatin that are bound by a member of the basal transcriptionalmachinery; and (d) comparing the promoter regions identified in steps(b) and (c) to determine overlapping genes, wherein the overlappinggenes are transcriptionally active genes regulated by thetranscriptional regulator.